You are currently browsing the tag archive for the ‘correlation’ tag.

This is a postscript to the previous blog post which was concerned with obtaining heuristic asymptotic predictions for the correlation

\displaystyle \sum_{n \leq x} \tau(n) \tau(n+h), \ \ \ \ \ (1)


for the divisor function {\tau(n) := \sum_{d|n} 1}, in particular recovering the calculation of Ingham that obtained the asymptotic

\displaystyle \sum_{n \leq x} \tau(n) \tau(n+h) \sim \frac{6}{\pi^2} \sigma_{-1}(h) x \log^2 x \ \ \ \ \ (2)


when {h} was fixed and non-zero and {x} went to infinity. It is natural to consider the more general correlations

\displaystyle \sum_{n \leq x} \tau_k(n) \tau_l(n+h)

for fixed {k,l \geq 1} and non-zero {h}, where

\displaystyle \tau_k(n) := \sum_{d_1 \dots d_k = n} 1

is the order {k} divisor function. The sum (1) then corresponds to the case {k=l=2}. For {l=1}, {\tau_1(n) = 1}, and a routine application of the Dirichlet hyperbola method (or Perron’s formula) gives the asymptotic

\displaystyle \sum_{n \leq x} \tau_k(n) \sim \frac{\log^{k-1} x}{(k-1)!} x,

or more accurately

\displaystyle \sum_{n \leq x} \tau_k(n) \sim P_k(\log x) x

where {P_k(t)} is a certain explicit polynomial of degree {k-1} with leading coefficient {\frac{1}{(k-1)!}}; see e.g. Exercise 31 of this previous post for a discussion of the {k=3} case (which is already typical). Similarly if {k=1}. For more general {k,l \geq 1}, there is a conjecture of Conrey and Gonek which predicts that

\displaystyle \sum_{n \leq x} \tau_k(n) \tau_l(n+h) \sim P_{k,l,h}(\log x) x

for some polynomial {P_{k,l,h}(t)} of degree {k+l-2} which is explicit but whose form is rather complicated (one has to compute residues of a various complicated products of zeta functions and local factors). This conjecture has been verified when {k \leq 2} or {l \leq 2}, by the work of Linnik, Motohashi, Fouvry-Tenenbaum, and others, but all the remaining cases when {k,l \geq 3} are currently open.

In principle, the calculations of the previous post should recover the predictions of Conrey and Gonek. In this post I would like to record this for the top order term:

Conjecture 1 If {k,l \geq 2} and {h \neq 0} are fixed, then

\displaystyle \sum_{n \leq x} \tau_k(n) \tau_l(n+h) \sim \frac{\log^{k-1} x}{(k-1)!} \frac{\log^{l-1} x}{(l-1)!} x \prod_p {\mathfrak S}_{k,l,p}(h)

as {x \rightarrow \infty}, where the product is over all primes {p}, and the local factors {{\mathfrak S}_{k,l,p}(h)} are given by the formula

\displaystyle {\mathfrak S}_{k,l,p}(h) := (\frac{p-1}{p})^{k+l-2} \sum_{j \geq 0: p^j|h} \frac{1}{p^j} P_{k,l,p}(j) \ \ \ \ \ (3)


where {P_{k,l,p}} is the degree {k+l-4} polynomial

\displaystyle P_{k,l,p}(j) := \sum_{k'=2}^k \sum_{l'=2}^l \binom{k-k'+j-1}{k-k'} \binom{l-l'+j-1}{l-l'} \alpha_{k',l',p}


\displaystyle \alpha_{k',l',p} := (\frac{p}{p-1})^{k'-1} + (\frac{p}{p-1})^{l'-1} - 1

and one adopts the conventions that {\binom{-1}{0} = 1} and {\binom{m-1}{m} = 0} for {m \geq 1}.

For instance, if {k=l=2} then

\displaystyle P_{2,2,p}(h) = \frac{p}{p-1} + \frac{p}{p-1} - 1 = \frac{p+1}{p-1}

and hence

\displaystyle {\mathfrak S}_{2,2,p}(h) = (1 - \frac{1}{p^2}) \sum_{j \geq 0: p^j|h} \frac{1}{p^j}

and the above conjecture recovers the Ingham formula (2). For {k=2, l=3}, we have

\displaystyle P_{2,3,p}(h) =

\displaystyle (\frac{p}{p-1} + (\frac{p}{p-1})^2 - 1) + (\frac{p}{p-1} + \frac{p}{p-1} - 1) j

\displaystyle = \frac{p^2+p-1}{(p-1)^2} + \frac{p+1}{p-1} j

and so we predict

\displaystyle \sum_{n \leq x} \tau(n) \tau_3(n+h) \sim \frac{x \log^3 x}{2} \prod_p {\mathfrak S}_{2,3,p}(h)


\displaystyle {\mathfrak S}_{2,3,p}(h) = \sum_{j \geq 0: p^j|h} \frac{\frac{p^3 - 2p + 1}{p^3} + \frac{(p+1)(p-1)^2}{p^3} j}{p^j}.

Similarly, if {k=l=3} we have

\displaystyle P_{3,3,p}(h) = ((\frac{p}{p-1})^2 + (\frac{p}{p-1})^2 - 1) + 2 (\frac{p}{p-1} + (\frac{p}{p-1})^2 - 1) j

\displaystyle + (\frac{p}{p-1} + \frac{p}{p-1} - 1) j^2

\displaystyle = \frac{p^2+2p-1}{(p-1)^2} + 2 \frac{p^2+p-1}{(p-1)^2} j + \frac{p+1}{p-1} j^2

and so we predict

\displaystyle \sum_{n \leq x} \tau_3(n) \tau_3(n+h) \sim \frac{x \log^4 x}{4} \prod_p {\mathfrak S}_{3,3,p}(h)


\displaystyle {\mathfrak S}_{3,3,p}(h) = \sum_{j \geq 0: p^j|h} \frac{\frac{p^4 - 4p^2 + 4p - 1}{p^4} + 2 \frac{(p^2+p-1)(p-1)^2}{p^4} j + \frac{(p+1)(p-1)^3}{p^4} j^2}{p^j}.

and so forth.

As in the previous blog, the idea is to factorise

\displaystyle \tau_k(n) = \prod_p \tau_{k,p}(n)

where the local factors {\tau_{k,p}(n)} are given by

\displaystyle \tau_{k,p}(n) := \sum_{j_1,\dots,j_k \geq 0: p^{j_1+\dots+j_k} || n} 1

(where {p^j || n} means that {p} divides {n} precisely {j} times), or in terms of the valuation {v_p(n)} of {n} at {p},

\displaystyle \tau_{k,p}(n) = \binom{k-1+v_p(n)}{k-1}. \ \ \ \ \ (4)


We then have the following exact local asymptotics:

Proposition 2 (Local correlations) Let {{\bf n}} be a profinite integer chosen uniformly at random, let {h} be a profinite integer, and let {k,l \geq 2}. Then

\displaystyle {\bf E} \tau_{k,p}({\bf n}) = (\frac{p}{p-1})^{k-1} \ \ \ \ \ (5)



\displaystyle {\bf E} \tau_{k,p}({\bf n}) \tau_{l,p}({\bf n}+h) = (\frac{p}{p-1})^{k+l-2} {\mathfrak S}_{k,l,p}(h). \ \ \ \ \ (6)


(For profinite integers it is possible that {v_p({\bf n})} and hence {\tau_{k,p}({\bf n})} are infinite, but this is a probability zero event and so can be ignored.)

Conjecture 1 can then be heuristically justified from the local calculations (2) by various pseudorandomness heuristics, as discussed in the previous post.

I’ll give a short proof of the above proposition below, basically using the recursive methods of the previous post. This short proof actually took be quite a while to find; I spent several hours and a fair bit of scratch paper working out the cases {k,l = 2,3} laboriously by hand (with some assistance and cross-checking from Maple). Here is an unorganised sample of some of this scratch, just to show how the sausage is actually made:


It was only after expending all this effort that I realised that it would be much more efficient to compute the correlations for all values of {k,l} simultaneously by using generating functions. After performing this computation, it then became apparent that there would be a direct combinatorial proof of (6) that was shorter than even the generating function proof. (I will not supply the full generating function calculations here, but will at least show them for the easier correlation (5).)

I am confident that Conjecture 1 is consistent with the explicit asymptotic in the Conrey-Gonek conjecture, but have not yet rigorously established that the leading order term in the latter is indeed identical to the expression provided above.

Read the rest of this entry »

Let {\tau(n) := \sum_{d|n} 1} be the divisor function. A classical application of the Dirichlet hyperbola method gives the asymptotic

\displaystyle \sum_{n \leq x} \tau(n) \sim x \log x

where {X \sim Y} denotes the estimate {X = (1+o(1))Y} as {x \rightarrow \infty}. Much better error estimates are possible here, but we will not focus on the lower order terms in this discussion. For somewhat idiosyncratic reasons I will interpret this estimate (and the other analytic number theory estimates discussed here) through the probabilistic lens. Namely, if {{\bf n} = {\bf n}_x} is a random number selected uniformly between {1} and {x}, then the above estimate can be written as

\displaystyle {\bf E} \tau( {\bf n} ) \sim \log x, \ \ \ \ \ (1)


that is to say the random variable {\tau({\bf n})} has mean approximately {\log x}. (But, somewhat paradoxically, this is not the median or mode behaviour of this random variable, which instead concentrates near {\log^{\log 2} x}, basically thanks to the Hardy-Ramanujan theorem.)

Now we turn to the pair correlations {\sum_{n \leq x} \tau(n) \tau(n+h)} for a fixed positive integer {h}. There is a classical computation of Ingham that shows that

\displaystyle \sum_{n \leq x} \tau(n) \tau(n+h) \sim \frac{6}{\pi^2} \sigma_{-1}(h) x \log^2 x, \ \ \ \ \ (2)



\displaystyle \sigma_{-1}(h) := \sum_{d|h} \frac{1}{d}.

The error term in (2) has been refined by many subsequent authors, as has the uniformity of the estimates in the {h} aspect, as these topics are related to other questions in analytic number theory, such as fourth moment estimates for the Riemann zeta function; but we will not consider these more subtle features of the estimate here. However, we will look at the next term in the asymptotic expansion for (2) below the fold.

Using our probabilistic lens, the estimate (2) can be written as

\displaystyle {\bf E} \tau( {\bf n} ) \tau( {\bf n} + h ) \sim \frac{6}{\pi^2} \sigma_{-1}(h) \log^2 x. \ \ \ \ \ (3)


From (1) (and the asymptotic negligibility of the shift by {h}) we see that the random variables {\tau({\bf n})} and {\tau({\bf n}+h)} both have a mean of {\sim \log x}, so the additional factor of {\frac{6}{\pi^2} \sigma_{-1}(h)} represents some arithmetic coupling between the two random variables.

Ingham’s formula can be established in a number of ways. Firstly, one can expand out {\tau(n) = \sum_{d|n} 1} and use the hyperbola method (splitting into the cases {d \leq \sqrt{x}} and {n/d \leq \sqrt{x}} and removing the overlap). If one does so, one soon arrives at the task of having to estimate sums of the form

\displaystyle \sum_{n \leq x: d|n} \tau(n+h)

for various {d \leq \sqrt{x}}. For {d} much less than {\sqrt{x}} this can be achieved using a further application of the hyperbola method, but for {d} comparable to {\sqrt{x}} things get a bit more complicated, necessitating the use of non-trivial estimates on Kloosterman sums in order to obtain satisfactory control on error terms. A more modern approach proceeds using automorphic form methods, as discussed in this previous post. A third approach, which unfortunately is only heuristic at the current level of technology, is to apply the Hardy-Littlewood circle method (discussed in this previous post) to express (2) in terms of exponential sums {\sum_{n \leq x} \tau(n) e(\alpha n)} for various frequencies {\alpha}. The contribution of “major arc” {\alpha} can be computed after a moderately lengthy calculation which yields the right-hand side of (2) (as well as the correct lower order terms that are currently being suppressed), but there does not appear to be an easy way to show directly that the “minor arc” contributions are of lower order, although the methods discussed previously do indirectly show that this is ultimately the case.

Each of the methods outlined above requires a fair amount of calculation, and it is not obvious while performing them that the factor {\frac{6}{\pi^2} \sigma_{-1}(h)} will emerge at the end. One can at least explain the {\frac{6}{\pi^2}} as a normalisation constant needed to balance the {\sigma_{-1}(h)} factor (at a heuristic level, at least). To see this through our probabilistic lens, introduce an independent copy {{\bf n}'} of {{\bf n}}, then

\displaystyle {\bf E} \tau( {\bf n} ) \tau( {\bf n}' ) = ({\bf E} \tau ({\bf n}))^2 \sim \log^2 x; \ \ \ \ \ (4)


using symmetry to order {{\bf n}' > {\bf n}} (discarding the diagonal case {{\bf n} = {\bf n}'}) and making the change of variables {{\bf n}' = {\bf n}+h}, we see that (4) is heuristically consistent with (3) as long as the asymptotic mean of {\frac{6}{\pi^2} \sigma_{-1}(h)} in {h} is equal to {1}. (This argument is not rigorous because there was an implicit interchange of limits present, but still gives a good heuristic “sanity check” of Ingham’s formula.) Indeed, if {{\bf E}_h} denotes the asymptotic mean in {h}, then we have (heuristically at least)

\displaystyle {\bf E}_h \sigma_{-1}(h) = \sum_d {\bf E}_h \frac{1}{d} 1_{d|h}

\displaystyle = \sum_d \frac{1}{d^2}

\displaystyle = \frac{\pi^2}{6}

and we obtain the desired consistency after multiplying by {\frac{6}{\pi^2}}.

This still however does not explain the presence of the {\sigma_{-1}(h)} factor. Intuitively it is reasonable that if {h} has many prime factors, and {{\bf n}} has a lot of factors, then {{\bf n}+h} will have slightly more factors than average, because any common factor to {h} and {{\bf n}} will automatically be acquired by {{\bf n}+h}. But how to quantify this effect?

One heuristic way to proceed is through analysis of local factors. Observe from the fundamental theorem of arithmetic that we can factor

\displaystyle \tau(n) = \prod_p \tau_p(n)

where the product is over all primes {p}, and {\tau_p(n) := \sum_{p^j|n} 1} is the local version of {\tau(n)} at {p} (which in this case, is just one plus the {p}valuation {v_p(n)} of {n}: {\tau_p = 1 + v_p}). Note that all but finitely many of the terms in this product will equal {1}, so the infinite product is well-defined. In a similar fashion, we can factor

\displaystyle \sigma_{-1}(h) = \prod_p \sigma_{-1,p}(h)


\displaystyle \sigma_{-1,p}(h) := \sum_{p^j|h} \frac{1}{p^j}

(or in terms of valuations, {\sigma_{-1,p}(h) = (1 - p^{-v_p(h)-1})/(1-p^{-1})}). Heuristically, the Chinese remainder theorem suggests that the various factors {\tau_p({\bf n})} behave like independent random variables, and so the correlation between {\tau({\bf n})} and {\tau({\bf n}+h)} should approximately decouple into the product of correlations between the local factors {\tau_p({\bf n})} and {\tau_p({\bf n}+h)}. And indeed we do have the following local version of Ingham’s asymptotics:

Proposition 1 (Local Ingham asymptotics) For fixed {p} and integer {h}, we have

\displaystyle {\bf E} \tau_p({\bf n}) \sim \frac{p}{p-1}


\displaystyle {\bf E} \tau_p({\bf n}) \tau_p({\bf n}+h) \sim (1-\frac{1}{p^2}) \sigma_{-1,p}(h) (\frac{p}{p-1})^2

\displaystyle = \frac{p+1}{p-1} \sigma_{-1,p}(h)

From the Euler formula

\displaystyle \prod_p (1-\frac{1}{p^2}) = \frac{1}{\zeta(2)} = \frac{6}{\pi^2}

we see that

\displaystyle \frac{6}{\pi^2} \sigma_{-1}(h) = \prod_p (1-\frac{1}{p^2}) \sigma_{-1,p}(h)

and so one can “explain” the arithmetic factor {\frac{6}{\pi^2} \sigma_{-1}(h)} in Ingham’s asymptotic as the product of the arithmetic factors {(1-\frac{1}{p^2}) \sigma_{-1,p}(h)} in the (much easier) local Ingham asymptotics. Unfortunately we have the usual “local-global” problem in that we do not know how to rigorously derive the global asymptotic from the local ones; this problem is essentially the same issue as the problem of controlling the minor arc contributions in the circle method, but phrased in “physical space” language rather than “frequency space”.

Remark 2 The relation between the local means {\sim \frac{p}{p-1}} and the global mean {\sim \log^2 x} can also be seen heuristically through the application

\displaystyle \prod_{p \leq x^{1/e^\gamma}} \frac{p}{p-1} \sim \log x

of Mertens’ theorem, where {1/e^\gamma} is Pólya’s magic exponent, which serves as a useful heuristic limiting threshold in situations where the product of local factors is divergent.

Let us now prove this proposition. One could brute-force the computations by observing that for any fixed {j}, the valuation {v_p({\bf n})} is equal to {j} with probability {\sim \frac{p-1}{p} \frac{1}{p^j}}, and with a little more effort one can also compute the joint distribution of {v_p({\bf n})} and {v_p({\bf n}+h)}, at which point the proposition reduces to the calculation of various variants of the geometric series. I however find it cleaner to proceed in a more recursive fashion (similar to how one can prove the geometric series formula by induction); this will also make visible the vague intuition mentioned previously about how common factors of {{\bf n}} and {h} force {{\bf n}+h} to have a factor also.

It is first convenient to get rid of error terms by observing that in the limit {x \rightarrow \infty}, the random variable {{\bf n} = {\bf n}_x} converges vaguely to a uniform random variable {{\bf n}_\infty} on the profinite integers {\hat {\bf Z}}, or more precisely that the pair {(v_p({\bf n}_x), v_p({\bf n}_x+h))} converges vaguely to {(v_p({\bf n}_\infty), v_p({\bf n}_\infty+h))}. Because of this (and because of the easily verified uniform integrability properties of {\tau_p({\bf n})} and their powers), it suffices to establish the exact formulae

\displaystyle {\bf E} \tau_p({\bf n}_\infty) = \frac{p}{p-1} \ \ \ \ \ (5)



\displaystyle {\bf E} \tau_p({\bf n}_\infty) \tau_p({\bf n}_\infty+h) = (1-\frac{1}{p^2}) \sigma_{-1,p}(h) (\frac{p}{p-1})^2 = \frac{p+1}{p-1} \sigma_{-1,p}(h) \ \ \ \ \ (6)


in the profinite setting (this setting will make it easier to set up the recursion).

We begin with (5). Observe that {{\bf n}_\infty} is coprime to {p} with probability {\frac{p-1}{p}}, in which case {\tau_p({\bf n}_\infty)} is equal to {1}. Conditioning to the complementary probability {\frac{1}{p}} event that {{\bf n}_\infty} is divisible by {p}, we can factor {{\bf n}_\infty = p {\bf n}'_\infty} where {{\bf n}'_\infty} is also uniformly distributed over the profinite integers, in which event we have {\tau_p( {\bf n}_\infty ) = 1 + \tau_p( {\bf n}'_\infty )}. We arrive at the identity

\displaystyle {\bf E} \tau_p({\bf n}_\infty) = \frac{p-1}{p} + \frac{1}{p} ( 1 + {\bf E} \tau_p( {\bf n}'_\infty ) ).

As {{\bf n}_\infty} and {{\bf n}'_\infty} have the same distribution, the quantities {{\bf E} \tau_p({\bf n}_\infty)} and {{\bf E} \tau_p({\bf n}'_\infty)} are equal, and (5) follows by a brief amount of high-school algebra.

We use a similar method to treat (6). First treat the case when {h} is coprime to {p}. Then we see that with probability {\frac{p-2}{p}}, {{\bf n}_\infty} and {{\bf n}_\infty+h} are simultaneously coprime to {p}, in which case {\tau_p({\bf n}_\infty) = \tau_p({\bf n}_\infty+h) = 1}. Furthermore, with probability {\frac{1}{p}}, {{\bf n}_\infty} is divisible by {p} and {{\bf n}_\infty+h} is not; in which case we can write {{\bf n} = p {\bf n}'} as before, with {\tau_p({\bf n}_\infty) = 1 + \tau_p({\bf n}'_\infty)} and {\tau_p({\bf n}_\infty+h)=1}. Finally, in the remaining event with probability {\frac{1}{p}}, {{\bf n}+h} is divisible by {p} and {{\bf n}} is not; we can then write {{\bf n}_\infty+h = p {\bf n}'_\infty}, so that {\tau_p({\bf n}_\infty+h) = 1 + \tau_p({\bf n}'_\infty)} and {\tau_p({\bf n}_\infty) = 1}. Putting all this together, we obtain

\displaystyle {\bf E} \tau_p({\bf n}_\infty) \tau_p({\bf n}_\infty+h) = \frac{p-2}{p} + 2 \frac{1}{p} (1 + {\bf E} \tau_p({\bf n}'_\infty))

and the claim (6) in this case follows from (5) and a brief computation (noting that {\sigma_{-1,p}(h)=1} in this case).

Now suppose that {h} is divisible by {p}, thus {h=ph'} for some integer {h'}. Then with probability {\frac{p-1}{p}}, {{\bf n}_\infty} and {{\bf n}_\infty+h} are simultaneously coprime to {p}, in which case {\tau_p({\bf n}_\infty) = \tau_p({\bf n}_\infty+h) = 1}. In the remaining {\frac{1}{p}} event, we can write {{\bf n}_\infty = p {\bf n}'_\infty}, and then {\tau_p({\bf n}_\infty) = 1 + \tau_p({\bf n}'_\infty)} and {\tau_p({\bf n}_\infty+h) = 1 + \tau_p({\bf n}'_\infty+h')}. Putting all this together we have

\displaystyle {\bf E} \tau_p({\bf n}_\infty) \tau_p({\bf n}_\infty+h) = \frac{p-1}{p} + \frac{1}{p} {\bf E} (1+\tau_p({\bf n}'_\infty)(1+\tau_p({\bf n}'_\infty+h)

which by (5) (and replacing {{\bf n}'_\infty} by {{\bf n}_\infty}) leads to the recursive relation

\displaystyle {\bf E} \tau_p({\bf n}_\infty) \tau_p({\bf n}_\infty+h) = \frac{p+1}{p-1} + \frac{1}{p} {\bf E} \tau_p({\bf n}_\infty) \tau_p({\bf n}_\infty+h)

and (6) then follows by induction on the number of powers of {p}.

The estimate (2) of Ingham was refined by Estermann, who obtained the more accurate expansion

\displaystyle \sum_{n \leq x} \tau(n) \tau(n+h) = \frac{6}{\pi^2} \sigma_{-1}(h) x \log^2 x + a_1(h) x \log x + a_2(h) x \ \ \ \ \ (7)


\displaystyle + O( x^{11/12+o(1)} )

for certain complicated but explicit coefficients {a_1(h), a_2(h)}. For instance, {a_1(h)} is given by the formula

\displaystyle a_1(h) = (\frac{12}{\pi^2} (2\gamma-1) + 4 a') \sigma_{-1}(h) - \frac{24}{\pi^2} \sigma'_{-1}(h)

where {\gamma} is the Euler-Mascheroni constant,

\displaystyle a' := - \sum_{r=1}^\infty \frac{\mu(r)}{r^2} \log r, \ \ \ \ \ (8)



\displaystyle \sigma'_{-1}(h) := \sum_{d|h} \frac{\log d}{d}.

The formula for {a_2(h)} is similar but even more complicated. The error term {O( x^{11/12+o(1)})} was improved by Heath-Brown to {O( x^{5/6+o(1)})}; it is conjectured (for instance by Conrey and Gonek) that one in fact has square root cancellation {O( x^{1/2+o(1)})} here, but this is well out of reach of current methods.

These lower order terms are traditionally computed either from a Dirichlet series approach (using Perron’s formula) or a circle method approach. It turns out that a refinement of the above heuristics can also predict these lower order terms, thus keeping the calculation purely in physical space as opposed to the “multiplicative frequency space” of the Dirichlet series approach, or the “additive frequency space” of the circle method, although the computations are arguably as messy as the latter computations for the purposes of working out the lower order terms. We illustrate this just for the {a_1(h) x \log x} term below the fold.

Read the rest of this entry »

Given two unit vectors {v,w} in a real inner product space, one can define the correlation between these vectors to be their inner product {\langle v, w \rangle}, or in more geometric terms, the cosine of the angle {\angle(v,w)} subtended by {v} and {w}. By the Cauchy-Schwarz inequality, this is a quantity between {-1} and {+1}, with the extreme positive correlation {+1} occurring when {v,w} are identical, the extreme negative correlation {-1} occurring when {v,w} are diametrically opposite, and the zero correlation {0} occurring when {v,w} are orthogonal. This notion is closely related to the notion of correlation between two non-constant square-integrable real-valued random variables {X,Y}, which is the same as the correlation between two unit vectors {v,w} lying in the Hilbert space {L^2(\Omega)} of square-integrable random variables, with {v} being the normalisation of {X} defined by subtracting off the mean {\mathbf{E} X} and then dividing by the standard deviation of {X}, and similarly for {w} and {Y}.

One can also define correlation for complex (Hermitian) inner product spaces by taking the real part {\hbox{Re} \langle , \rangle} of the complex inner product to recover a real inner product.

While reading the (highly recommended) recent popular maths book “How not to be wrong“, by my friend and co-author Jordan Ellenberg, I came across the (important) point that correlation is not necessarily transitive: if {X} correlates with {Y}, and {Y} correlates with {Z}, then this does not imply that {X} correlates with {Z}. A simple geometric example is provided by the three unit vectors

\displaystyle  u := (1,0); v := (\frac{1}{\sqrt{2}}, \frac{1}{\sqrt{2}}); w := (0,1)

in the Euclidean plane {{\bf R}^2}: {u} and {v} have a positive correlation of {\frac{1}{\sqrt{2}}}, as does {v} and {w}, but {u} and {w} are not correlated with each other. Or: for a typical undergraduate course, it is generally true that good exam scores are correlated with a deep understanding of the course material, and memorising from flash cards are correlated with good exam scores, but this does not imply that memorising flash cards is correlated with deep understanding of the course material.

However, there are at least two situations in which some partial version of transitivity of correlation can be recovered. The first is in the “99%” regime in which the correlations are very close to {1}: if {u,v,w} are unit vectors such that {u} is very highly correlated with {v}, and {v} is very highly correlated with {w}, then this does imply that {u} is very highly correlated with {w}. Indeed, from the identity

\displaystyle  \| u-v \| = 2^{1/2} (1 - \langle u,v\rangle)^{1/2}

(and similarly for {v-w} and {u-w}) and the triangle inequality

\displaystyle  \|u-w\| \leq \|u-v\| + \|v-w\|,

we see that

\displaystyle  (1 - \langle u,w \rangle)^{1/2} \leq (1 - \langle u,v\rangle)^{1/2} + (1 - \langle v,w\rangle)^{1/2}. \ \ \ \ \ (1)

Thus, for instance, if {\langle u, v \rangle \geq 1-\varepsilon} and {\langle v,w \rangle \geq 1-\varepsilon}, then {\langle u,w \rangle \geq 1-4\varepsilon}. This is of course closely related to (though slightly weaker than) the triangle inequality for angles:

\displaystyle  \angle(u,w) \leq \angle(u,v) + \angle(v,w).

Remark 1 (Thanks to Andrew Granville for conversations leading to this observation.) The inequality (1) also holds for sub-unit vectors, i.e. vectors {u,v,w} with {\|u\|, \|v\|, \|w\| \leq 1}. This comes by extending {u,v,w} in directions orthogonal to all three original vectors and to each other in order to make them unit vectors, enlarging the ambient Hilbert space {H} if necessary. More concretely, one can apply (1) to the unit vectors

\displaystyle  (u, \sqrt{1-\|u\|^2}, 0, 0), (v, 0, \sqrt{1-\|v\|^2}, 0), (w, 0, 0, \sqrt{1-\|w\|^2})

in {H \times {\bf R}^3}.

But even in the “{1\%}” regime in which correlations are very weak, there is still a version of transitivity of correlation, known as the van der Corput lemma, which basically asserts that if a unit vector {v} is correlated with many unit vectors {u_1,\dots,u_n}, then many of the pairs {u_i,u_j} will then be correlated with each other. Indeed, from the Cauchy-Schwarz inequality

\displaystyle  |\langle v, \sum_{i=1}^n u_i \rangle|^2 \leq \|v\|^2 \| \sum_{i=1}^n u_i \|^2

we see that

\displaystyle  (\sum_{i=1}^n \langle v, u_i \rangle)^2 \leq \sum_{1 \leq i,j \leq n} \langle u_i, u_j \rangle. \ \ \ \ \ (2)

Thus, for instance, if {\langle v, u_i \rangle \geq \varepsilon} for at least {\varepsilon n} values of {i=1,\dots,n}, then (after removing those indices {i} for which {\langle v, u_i \rangle < \varepsilon}) {\sum_{i,j} \langle u_i, u_j \rangle} must be at least {\varepsilon^4 n^2}, which implies that {\langle u_i, u_j \rangle \geq \varepsilon^4/2} for at least {\varepsilon^4 n^2/2} pairs {(i,j)}. Or as another example: if a random variable {X} exhibits at least {1\%} positive correlation with {n} other random variables {Y_1,\dots,Y_n}, then if {n > 10,000}, at least two distinct {Y_i,Y_j} must have positive correlation with each other (although this argument does not tell you which pair {Y_i,Y_j} are so correlated). Thus one can view this inequality as a sort of `pigeonhole principle” for correlation.

A similar argument (multiplying each {u_i} by an appropriate sign {\pm 1}) shows the related van der Corput inequality

\displaystyle  (\sum_{i=1}^n |\langle v, u_i \rangle|)^2 \leq \sum_{1 \leq i,j \leq n} |\langle u_i, u_j \rangle|, \ \ \ \ \ (3)

and this inequality is also true for complex inner product spaces. (Also, the {u_i} do not need to be unit vectors for this inequality to hold.)

Geometrically, the picture is this: if {v} positively correlates with all of the {u_1,\dots,u_n}, then the {u_1,\dots,u_n} are all squashed into a somewhat narrow cone centred at {v}. The cone is still wide enough to allow a few pairs {u_i, u_j} to be orthogonal (or even negatively correlated) with each other, but (when {n} is large enough) it is not wide enough to allow all of the {u_i,u_j} to be so widely separated. Remarkably, the bound here does not depend on the dimension of the ambient inner product space; while increasing the number of dimensions should in principle add more “room” to the cone, this effect is counteracted by the fact that in high dimensions, almost all pairs of vectors are close to orthogonal, and the exceptional pairs that are even weakly correlated to each other become exponentially rare. (See this previous blog post for some related discussion; in particular, Lemma 2 from that post is closely related to the van der Corput inequality presented here.)

A particularly common special case of the van der Corput inequality arises when {v} is a unit vector fixed by some unitary operator {T}, and the {u_i} are shifts {u_i = T^i u} of a single unit vector {u}. In this case, the inner products {\langle v, u_i \rangle} are all equal, and we arrive at the useful van der Corput inequality

\displaystyle  |\langle v, u \rangle|^2 \leq \frac{1}{n^2} \sum_{1 \leq i,j \leq n} |\langle T^i u, T^j u \rangle|. \ \ \ \ \ (4)

(In fact, one can even remove the absolute values from the right-hand side, by using (2) instead of (4).) Thus, to show that {v} has negligible correlation with {u}, it suffices to show that the shifts of {u} have negligible correlation with each other.

Here is a basic application of the van der Corput inequality:

Proposition 2 (Weyl equidistribution estimate) Let {P: {\bf Z} \rightarrow {\bf R}/{\bf Z}} be a polynomial with at least one non-constant coefficient irrational. Then one has

\displaystyle  \lim_{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^N e( P(n) ) = 0,

where {e(x) := e^{2\pi i x}}.

Note that this assertion implies the more general assertion

\displaystyle  \lim_{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^N e( kP(n) ) = 0

for any non-zero integer {k} (simply by replacing {P} by {kP}), which by the Weyl equidistribution criterion is equivalent to the sequence {P(1), P(2),\dots} being asymptotically equidistributed in {{\bf R}/{\bf Z}}.

Proof: We induct on the degree {d} of the polynomial {P}, which must be at least one. If {d} is equal to one, the claim is easily established from the geometric series formula, so suppose that {d>1} and that the claim has already been proven for {d-1}. If the top coefficient {a_d} of {P(n) = a_d n^d + \dots + a_0} is rational, say {a_d = \frac{p}{q}}, then by partitioning the natural numbers into residue classes modulo {q}, we see that the claim follows from the induction hypothesis; so we may assume that the top coefficient {a_d} is irrational.

In order to use the van der Corput inequality as stated above (i.e. in the formalism of inner product spaces) we will need a non-principal ultrafilter {p} (see e.g this previous blog post for basic theory of ultrafilters); we leave it as an exercise to the reader to figure out how to present the argument below without the use of ultrafilters (or similar devices, such as Banach limits). The ultrafilter {p} defines an inner product {\langle, \rangle_p} on bounded complex sequences {z = (z_1,z_2,z_3,\dots)} by setting

\displaystyle  \langle z, w \rangle_p := \hbox{st} \lim_{N \rightarrow p} \frac{1}{N} \sum_{n=1}^N z_n \overline{w_n}.

Strictly speaking, this inner product is only positive semi-definite rather than positive definite, but one can quotient out by the null vectors to obtain a positive-definite inner product. To establish the claim, it will suffice to show that

\displaystyle  \langle 1, e(P) \rangle_p = 0

for every non-principal ultrafilter {p}.

Note that the space of bounded sequences (modulo null vectors) admits a shift {T}, defined by

\displaystyle  T (z_1,z_2,\dots) := (z_2,z_3,\dots).

This shift becomes unitary once we quotient out by null vectors, and the constant sequence {1} is clearly a unit vector that is invariant with respect to the shift. So by the van der Corput inequality, we have

\displaystyle  |\langle 1, e(P) \rangle_p| \leq \frac{1}{n^2} \sum_{1 \leq i,j \leq n} |\langle T^i e(P), T^j e(P) \rangle_p|

for any {n \geq 1}. But we may rewrite {\langle T^i e(P), T^j e(P) \rangle = \langle 1, e(T^j P - T^i P) \rangle_p}. Then observe that if {i \neq j}, {T^j P - T^i P} is a polynomial of degree {d-1} whose {d-1} coefficient is irrational, so by induction hypothesis we have {\langle T^i e(P), T^j e(P) \rangle_p = 0} for {i \neq j}. For {i=j} we of course have {\langle T^i e(P), T^j e(P) \rangle_p = 1}, and so

\displaystyle  |\langle 1, e(P) \rangle_p| \leq \frac{1}{n^2} \times n

for any {n}. Letting {n \rightarrow \infty}, we obtain the claim. \Box


RSS Google+ feed

  • An error has occurred; the feed is probably down. Try again later.