You are currently browsing the tag archive for the ‘central limit theorem’ tag.

Van Vu and I have just uploaded to the arXiv our paper A central limit theorem for the determinant of a Wigner matrix, submitted to Adv. Math.. It studies the asymptotic distribution of the determinant {\det M_n} of a random Wigner matrix (such as a matrix drawn from the Gaussian Unitary Ensemble (GUE) or Gaussian Orthogonal Ensemble (GOE)).

Before we get to these results, let us first discuss the simpler problem of studying the determinant {\det A_n} of a random iid matrix {A_n = (\zeta_{ij})_{1 \leq i,j \leq n}}, such as a real gaussian matrix (where all entries are independently and identically distributed using the standard real normal distribution {\zeta_{ij} \equiv N(0,1)_{\bf R}}), a complex gaussian matrix (where all entries are independently and identically distributed using the standard complex normal distribution {\zeta_{ij} \equiv N(0,1)_{\bf C}}, thus the real and imaginary parts are independent with law {N(0,1/2)_{\bf R}}), or the random sign matrix (in which all entries are independently and identically distributed according to the Bernoulli distribution {\zeta_{ij} \equiv \pm 1} (with a {1/2} chance of either sign). More generally, one can consider a matrix {A_n} in which all the entries {\zeta_{ij}} are independently and identically distributed with mean zero and variance {1}.

We can expand {\det A_n} using the Leibniz expansion

\displaystyle  \det A_n = \sum_{\sigma \in S_n} I_\sigma, \ \ \ \ \ (1)

where {\sigma: \{1,\ldots,n\} \rightarrow \{1,\ldots,n\}} ranges over the permutations of {\{1,\ldots,n\}}, and {I_\sigma} is the product

\displaystyle  I_\sigma := \hbox{sgn}(\sigma) \prod_{i=1}^n \zeta_{i\sigma(i)}.

From the iid nature of the {\zeta_{ij}}, we easily see that each {I_\sigma} has mean zero and variance one, and are pairwise uncorrelated as {\sigma} varies. We conclude that {\det A_n} has mean zero and variance {n!} (an observation first made by Turán). In particular, from Chebyshev’s inequality we see that {\det A_n} is typically of size {O(\sqrt{n!})}.

It turns out, though, that this is not quite best possible. This is easiest to explain in the real gaussian case, by performing a computation first made by Goodman. In this case, {\det A_n} is clearly symmetrical, so we can focus attention on the magnitude {|\det A_n|}. We can interpret this quantity geometrically as the volume of an {n}-dimensional parallelopiped whose generating vectors {X_1,\ldots,X_n} are independent real gaussian vectors in {{\bf R}^n} (i.e. their coefficients are iid with law {N(0,1)_{\bf R}}). Using the classical base-times-height formula, we thus have

\displaystyle  |\det A_n| = \prod_{i=1}^n \hbox{dist}(X_i, V_i) \ \ \ \ \ (2)

where {V_i} is the {i-1}-dimensional linear subspace of {{\bf R}^n} spanned by {X_1,\ldots,X_{i-1}} (note that {X_1,\ldots,X_n}, having an absolutely continuous joint distribution, are almost surely linearly independent). Taking logarithms, we conclude

\displaystyle  \log |\det A_n| = \sum_{i=1}^n \log \hbox{dist}(X_i, V_i).

Now, we take advantage of a fundamental symmetry property of the Gaussian vector distribution, namely its invariance with respect to the orthogonal group {O(n)}. Because of this, we see that if we fix {X_1,\ldots,X_{i-1}} (and thus {V_i}, the random variable {\hbox{dist}(X_i,V_i)} has the same distribution as {\hbox{dist}(X_i,{\bf R}^{i-1})}, or equivalently the {\chi} distribution

\displaystyle  \chi_{n-i+1} := (\sum_{j=1}^{n-i+1} \xi_{n-i+1,j}^2)^{1/2}

where {\xi_{n-i+1,1},\ldots,\xi_{n-i+1,n-i+1}} are iid copies of {N(0,1)_{\bf R}}. As this distribution does not depend on the {X_1,\ldots,X_{i-1}}, we conclude that the law of {\log |\det A_n|} is given by the sum of {n} independent {\chi}-variables:

\displaystyle  \log |\det A_n| \equiv \sum_{j=1}^{n} \log \chi_j.

A standard computation shows that each {\chi_j^2} has mean {j} and variance {2j}, and then a Taylor series (or Ito calculus) computation (using concentration of measure tools to control tails) shows that {\log \chi_j} has mean {\frac{1}{2} \log j - \frac{1}{2j} + O(1/j^{3/2})} and variance {\frac{1}{2j}+O(1/j^{3/2})}. As such, {\log |\det A_n|} has mean {\frac{1}{2} \log n! - \frac{1}{2} \log n + O(1)} and variance {\frac{1}{2} \log n + O(1)}. Applying a suitable version of the central limit theorem, one obtains the asymptotic law

\displaystyle  \frac{\log |\det A_n| - \frac{1}{2} \log n! + \frac{1}{2} \log n}{\sqrt{\frac{1}{2}\log n}} \rightarrow N(0,1)_{\bf R}, \ \ \ \ \ (3)

where {\rightarrow} denotes convergence in distribution. A bit more informally, we have

\displaystyle  |\det A_n| \approx n^{-1/2} \sqrt{n!} \exp( N( 0, \log n / 2 )_{\bf R} ) \ \ \ \ \ (4)

when {A_n} is a real gaussian matrix; thus, for instance, the median value of {|\det A_n|} is {n^{-1/2+o(1)} \sqrt{n!}}. At first glance, this appears to conflict with the second moment bound {\mathop{\bf E} |\det A_n|^2 = n!} of Turán mentioned earlier, but once one recalls that {\exp(N(0,t)_{\bf R})} has a second moment of {\exp(2t)}, we see that the two facts are in fact perfectly consistent; the upper tail of the normal distribution in the exponent in (4) ends up dominating the second moment.

It turns out that the central limit theorem (3) is valid for any real iid matrix with mean zero, variance one, and an exponential decay condition on the entries; this was first claimed by Girko, though the arguments in that paper appear to be incomplete. Another proof of this result, with more quantitative bounds on the convergence rate has been recently obtained by Hoi Nguyen and Van Vu. The basic idea in these arguments is to express the sum in (2) in terms of a martingale and apply the martingale central limit theorem.

If one works with complex gaussian random matrices instead of real gaussian random matrices, the above computations change slightly (one has to replace the real {\chi} distribution with the complex {\chi} distribution, in which the {\xi_{i,j}} are distributed according to the complex gaussian {N(0,1)_{\bf C}} instead of the real one). At the end of the day, one ends up with the law

\displaystyle  \frac{\log |\det A_n| - \frac{1}{2} \log n! + \frac{1}{4} \log n}{\sqrt{\frac{1}{4}\log n}} \rightarrow N(0,1)_{\bf R}, \ \ \ \ \ (5)

or more informally

\displaystyle  |\det A_n| \approx n^{-1/4} \sqrt{n!} \exp( N( 0, \log n / 4 )_{\bf R} ) \ \ \ \ \ (6)

(but note that this new asymptotic is still consistent with Turán’s second moment calculation).

We can now turn to the results of our paper. Here, we replace the iid matrices {A_n} by Wigner matrices {M_n = (\zeta_{ij})_{1 \leq i,j \leq n}}, which are defined similarly but are constrained to be Hermitian (or real symmetric), thus {\zeta_{ij} = \overline{\zeta_{ji}}} for all {i,j}. Model examples here include the Gaussian Unitary Ensemble (GUE), in which {\zeta_{ij} \equiv N(0,1)_{\bf C}} for {1 \leq i < j \leq n} and {\zeta_{ij} \equiv N(0,1)_{\bf R}} for {1 \leq i=j \leq n}, the Gaussian Orthogonal Ensemble (GOE), in which {\zeta_{ij} \equiv N(0,1)_{\bf R}} for {1 \leq i < j \leq n} and {\zeta_{ij} \equiv N(0,2)_{\bf R}} for {1 \leq i=j \leq n}, and the symmetric Bernoulli ensemble, in which {\zeta_{ij} \equiv \pm 1} for {1 \leq i \leq j \leq n} (with probability {1/2} of either sign). In all cases, the upper triangular entries of the matrix are assumed to be jointly independent. For a more precise definition of the Wigner matrix ensembles we are considering, see the introduction to our paper.

The determinants {\det M_n} of these matrices still have a Leibniz expansion. However, in the Wigner case, the mean and variance of the {I_\sigma} are slightly different, and what is worse, they are not all pairwise uncorrelated any more. For instance, the mean of {I_\sigma} is still usually zero, but equals {(-1)^{n/2}} in the exceptional case when {\sigma} is a perfect matching (i.e. the union of exactly {n/2} {2}-cycles, a possibility that can of course only happen when {n} is even). As such, the mean {\mathop{\bf E} \det M_n} still vanishes when {n} is odd, but for even {n} it is equal to

\displaystyle  (-1)^{n/2} \frac{n!}{(n/2)!2^{n/2}}

(the fraction here simply being the number of perfect matchings on {n} vertices). Using Stirling’s formula, one then computes that {|\mathop{\bf E} \det A_n|} is comparable to {n^{-1/4} \sqrt{n!}} when {n} is large and even. The second moment calculation is more complicated (and uses facts about the distribution of cycles in random permutations, mentioned in this previous post), but one can compute that {\mathop{\bf E} |\det A_n|^2} is comparable to {n^{1/2} n!} for GUE and {n^{3/2} n!} for GOE. (The discrepancy here comes from the fact that in the GOE case, {I_\sigma} and {I_\rho} can correlate when {\rho} contains reversals of {k}-cycles of {\sigma} for {k \geq 3}, but this does not happen in the GUE case.) For GUE, much more precise asymptotics for the moments of the determinant are known, starting from the work of Brezin and Hikami, though we do not need these more sophisticated computations here.

Our main results are then as follows.

Theorem 1 Let {M_n} be a Wigner matrix.

  • If {M_n} is drawn from GUE, then

    \displaystyle  \frac{\log |\det M_n| - \frac{1}{2} \log n! + \frac{1}{4} \log n}{\sqrt{\frac{1}{2}\log n}} \rightarrow N(0,1)_{\bf R}.

  • If {M_n} is drawn from GOE, then

    \displaystyle  \frac{\log |\det M_n| - \frac{1}{2} \log n! + \frac{1}{4} \log n}{\sqrt{\log n}} \rightarrow N(0,1)_{\bf R}.

  • The previous two results also hold for more general Wigner matrices, assuming that the real and imaginary parts are independent, a finite moment condition is satisfied, and the entries match moments with those of GOE or GUE to fourth order. (See the paper for a more precise formulation of the result.)

Thus, we informally have

\displaystyle  |\det M_n| \approx n^{-1/4} \sqrt{n!} \exp( N( 0, \log n / 2 )_{\bf R} )

when {M_n} is drawn from GUE, or from another Wigner ensemble matching GUE to fourth order (and obeying some additional minor technical hypotheses); and

\displaystyle  |\det M_n| \approx n^{-1/4} \sqrt{n!} \exp( N( 0, \log n )_{\bf R} )

when {M_n} is drawn from GOE, or from another Wigner ensemble matching GOE to fourth order. Again, these asymptotic limiting distributions are consistent with the asymptotic behaviour for the second moments.

The extension from the GUE or GOE case to more general Wigner ensembles is a fairly routine application of the four moment theorem for Wigner matrices, although for various technical reasons we do not quite use the existing four moment theorems in the literature, but adapt them to the log determinant. The main idea is to express the log-determinant as an integral

\displaystyle  \log|\det M_n| = \frac{1}{2} n \log n - n \hbox{Im} \int_0^\infty s(\sqrt{-1}\eta)\ d\eta \ \ \ \ \ (7)

of the Stieltjes transform

\displaystyle  s(z) := \frac{1}{n} \hbox{tr}( \frac{1}{\sqrt{n}} M_n - z )^{-1}

of {M_n}. Strictly speaking, the integral in (7) is divergent at infinity (and also can be ill-behaved near zero), but this can be addressed by standard truncation and renormalisation arguments (combined with known facts about the least singular value of Wigner matrices), which we omit here. We then use a variant of the four moment theorem for the Stieltjes transform, as used by Erdos, Yau, and Yin (based on a previous four moment theorem for individual eigenvalues introduced by Van Vu and myself). The four moment theorem is proven by the now-standard Lindeberg exchange method, combined with the usual resolvent identities to control the behaviour of the resolvent (and hence the Stieltjes transform) with respect to modifying one or two entries, together with the delocalisation of eigenvector property (which in turn arises from local semicircle laws) to control the error terms.

Somewhat surprisingly (to us, at least), it turned out that it was the first part of the theorem (namely, the verification of the limiting law for the invariant ensembles GUE and GOE) that was more difficult than the extension to the Wigner case. Even in an ensemble as highly symmetric as GUE, the rows are no longer independent, and the formula (2) is basically useless for getting any non-trivial control on the log determinant. There is an explicit formula for the joint distribution of the eigenvalues of GUE (or GOE), which does eventually give the distribution of the cumulants of the log determinant, which then gives the required central limit theorem; but this is a lengthy computation, first performed by Delannay and Le Caer.

Following a suggestion of my colleague, Rowan Killip, we give an alternate proof of this central limit theorem in the GUE and GOE cases, by using a beautiful observation of Trotter, namely that the GUE or GOE ensemble can be conjugated into a tractable tridiagonal form. Let me state it just for GUE:

Proposition 2 (Tridiagonal form of GUE) \cite{trotter} Let {M'_n} be the random tridiagonal real symmetric matrix

\displaystyle  M'_n = \begin{pmatrix} a_1 & b_1 & 0 & \ldots & 0 & 0 \\ b_1 & a_2 & b_2 & \ldots & 0 & 0 \\ 0 & b_2 & a_3 & \ldots & 0 & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \ldots & a_{n-1} & b_{n-1} \\ 0 & 0 & 0 & \ldots & b_{n-1} & a_n \end{pmatrix}

where the {a_1,\ldots,a_n, b_1,\ldots,b_{n-1}} are jointly independent real random variables, with {a_1,\ldots,a_n \equiv N(0,1)_{\bf R}} being standard real Gaussians, and each {b_i} having a {\chi}-distribution:

\displaystyle  b_i = (\sum_{j=1}^i |z_{i,j}|^2)^{1/2}

where {z_{i,j} \equiv N(0,1)_{\bf C}} are iid complex gaussians. Let {M_n} be drawn from GUE. Then the joint eigenvalue distribution of {M_n} is identical to the joint eigenvalue distribution of {M'_n}.

Proof: Let {M_n} be drawn from GUE. We can write

\displaystyle  M_n = \begin{pmatrix} M_{n-1} & X_n \\ X_n^* & a_n \end{pmatrix}

where {M_{n-1}} is drawn from the {n-1\times n-1} GUE, {a_n \equiv N(0,1)_{\bf R}}, and {X_n \in {\bf C}^{n-1}} is a random gaussian vector with all entries iid with distribution {N(0,1)_{\bf C}}. Furthermore, {M_{n-1}, X_n, a_n} are jointly independent.

We now apply the tridiagonal matrix algorithm. Let {b_{n-1} := |X_n|}, then {b_n} has the {\chi}-distribution indicated in the proposition. We then conjugate {M_n} by a unitary matrix {U} that preserves the final basis vector {e_n}, and maps {X} to {b_{n-1} e_{n-1}}. Then we have

\displaystyle  U M_n U^* = \begin{pmatrix} \tilde M_{n-1} & b_{n-1} e_{n-1} \\ b_{n-1} e_{n-1}^* & a_n \end{pmatrix}

where {\tilde M_{n-1}} is conjugate to {M_{n-1}}. Now we make the crucial observation: because {M_{n-1}} is distributed according to GUE (which is a unitarily invariant ensemble), and {U} is a unitary matrix independent of {M_{n-1}}, {\tilde M_{n-1}} is also distributed according to GUE, and remains independent of both {b_{n-1}} and {a_n}.

We continue this process, expanding {U M_n U^*} as

\displaystyle \begin{pmatrix} M_{n-2} & X_{n-1} & 0 \\ X_{n-1}^* & a_{n-1} & b_{n-1} \\ 0 & b_{n-1} & a_n. \end{pmatrix}

Applying a further unitary conjugation that fixes {e_{n-1}, e_n} but maps {X_{n-1}} to {b_{n-2} e_{n-2}}, we may replace {X_{n-1}} by {b_{n-2} e_{n-2}} while transforming {M_{n-2}} to another GUE matrix {\tilde M_{n-2}} independent of {a_n, b_{n-1}, a_{n-1}, b_{n-2}}. Iterating this process, we eventually obtain a coupling of {M_n} to {M'_n} by unitary conjugations, and the claim follows. \Box

The determinant of a tridiagonal matrix is not quite as simple as the determinant of a triangular matrix (in which it is simply the product of the diagonal entries), but it is pretty close: the determinant {D_n} of the above matrix is given by solving the recursion

\displaystyle  D_i = a_i D_{i-1} + b_{i-1}^2 D_{i-2}

with {D_0=1} and {D_{-1} = 0}. Thus, instead of the product of a sequence of independent scalar {\chi} distributions as in the gaussian matrix case, the determinant of GUE ends up being controlled by the product of a sequence of independent {2\times 2} matrices whose entries are given by gaussians and {\chi} distributions. In this case, one cannot immediately take logarithms and hope to get something for which the martingale central limit theorem can be applied, but some ad hoc manipulation of these {2 \times 2} matrix products eventually does make this strategy work. (Roughly speaking, one has to work with the logarithm of the Frobenius norm of the matrix first.)

I’ve spent the last week or so reworking the first draft of my universality article for Mathematics Awareness Month, in view of the useful comments and feedback received on that draft here on this blog, as well as elsewhere.  In fact, I ended up rewriting the article from scratch, and expanding it substantially, in order to focus on a more engaging and less technical narrative.  I found that I had to use a substantially different mindset than the one I am used to having for technical expository writing; indeed, the exercise reminded me more of my high school English assignments than of my professional work.  (This is perhaps a bad sign: English was not exactly my strongest subject as a student.)

The piece now has title: “E pluribus unum: from complexity, universality”.  This is a somewhat US-centric piece of wordplay, but Mathematics Awareness Month is, after all, a US-based initiative, even though awareness of mathematics certainly transcends national boundaries.   Still, it is a trivial matter to modify the title later if a better proposal arises, and I am sure that if I send this text to be published, that the editors may have some suggestions in this regard.

By coincidence, I moved up and expanded the other US-centric item – the discussion of the 2008 US presidential elections – to the front of the paper to play the role of the hook.  I’ll try to keep the Commonwealth spelling conventions, though. :-)

I decided to cut out the discussion of the N-body problem for various values of N, in part due to the confusion over the notion of a “solution”; there is a nice mathematical story there, but perhaps one that gets in the way of the main story of universality.

I have added a fair number of relevant images, though some of them will have to be changed in the final version for copyright reasons.  The narrow column format of this blog means that the image placement is not optimal, but I am sure that this can be rectified if this article is published professionally.

Read the rest of this entry »

The month of April has been designated as Mathematics Awareness Month by the major American mathematics organisations (the AMS, ASA, MAA, and SIAM).  I was approached to write a popular mathematics article for April 2011 (the theme for that month is “Mathematics and Complexity”).  While I have written a fair number of expository articles (including several on this blog) aimed at a mathematical audience, I actually have not had much experience writing articles at the popular mathematics level, and so I found this task to be remarkably difficult.  At this level of exposition, one not only needs to explain the facts, but also to tell a story; I have experience in the former but not in the latter.

I decided to write on the topic of universality – the phenomenon that the macroscopic behaviour of a dynamical system can be largely independent of the precise microscopic structure.   Below the fold is a first draft of the article; I would definitely welcome feedback and corrections.  It does not yet have any pictures, but I plan to rectify that in the final draft.  It also does not have a title, but this will be easy to address later.   But perhaps the biggest thing lacking right now is a narrative “hook”; I don’t yet have any good ideas as to how to make the story of universality compelling to a lay audience.  Any suggestions in this regard would be particularly appreciated.

I have not yet decided where I would try to publish this article; in fact, I might just publish it here on this blog (and eventually, in one of the blog book compilations).

Read the rest of this entry »

Consider the sum {S_n := X_1+\ldots+X_n} of iid real random variables {X_1,\ldots,X_n \equiv X} of finite mean {\mu} and variance {\sigma^2} for some {\sigma > 0}. Then the sum {S_n} has mean {n\mu} and variance {n\sigma^2}, and so (by Chebyshev’s inequality) we expect {S_n} to usually have size {n\mu + O(\sqrt{n} \sigma)}. To put it another way, if we consider the normalised sum

\displaystyle  Z_n := \frac{S_n - n \mu}{\sqrt{n} \sigma} \ \ \ \ \ (1)

then {Z_n} has been normalised to have mean zero and variance {1}, and is thus usually of size {O(1)}.

In the previous set of notes, we were able to establish various tail bounds on {Z_n}. For instance, from Chebyshev’s inequality one has

\displaystyle  {\bf P}(|Z_n| > \lambda) \leq \lambda^{-2}, \ \ \ \ \ (2)

and if the original distribution {X} was bounded or subgaussian, we had the much stronger Chernoff bound

\displaystyle  {\bf P}(|Z_n| > \lambda) \leq C \exp( - c \lambda^2 ) \ \ \ \ \ (3)

for some absolute constants {C, c > 0}; in other words, the {Z_n} are uniformly subgaussian.

Now we look at the distribution of {Z_n}. The fundamental central limit theorem tells us the asymptotic behaviour of this distribution:

Theorem 1 (Central limit theorem) Let {X_1,\ldots,X_n \equiv X} be iid real random variables of finite mean {\mu} and variance {\sigma^2} for some {\sigma > 0}, and let {Z_n} be the normalised sum (1). Then as {n \rightarrow \infty}, {Z_n} converges in distribution to the standard normal distribution {N(0,1)_{\bf R}}.

Exercise 1 Show that {Z_n} does not converge in probability or in the almost sure sense (in the latter case, we think of {X_1,X_2,\ldots} as an infinite sequence of iid random variables). (Hint: the intuition here is that for two very different values {n_1 \ll n_2} of {n}, the quantities {Z_{n_1}} and {Z_{n_2}} are almost independent of each other, since the bulk of the sum {S_{n_2}} is determined by those {X_n} with {n > n_1}. Now make this intuition precise.)

Exercise 2 Use Stirling’s formula from Notes 0a to verify the central limit theorem in the case when {X} is a Bernoulli distribution, taking the values {0} and {1} only. (This is a variant of Exercise 2 from those notes, or Exercise 2 from Notes 1. It is easy to see that once one does this, one can rescale and handle any other two-valued distribution also.)

Exercise 3 Use Exercise 9 from Notes 1 to verify the central limit theorem in the case when {X} is gaussian.

Note we are only discussing the case of real iid random variables. The case of complex random variables (or more generally, vector-valued random variables) is a little bit more complicated, and will be discussed later in this post.

The central limit theorem (and its variants, which we discuss below) are extremely useful tools in random matrix theory, in particular through the control they give on random walks (which arise naturally from linear functionals of random matrices). But the central limit theorem can also be viewed as a “commutative” analogue of various spectral results in random matrix theory (in particular, we shall see in later lectures that the Wigner semicircle law can be viewed in some sense as a “noncommutative” or “free” version of the central limit theorem). Because of this, the techniques used to prove the central limit theorem can often be adapted to be useful in random matrix theory. Because of this, we shall use these notes to dwell on several different proofs of the central limit theorem, as this provides a convenient way to showcase some of the basic methods that we will encounter again (in a more sophisticated form) when dealing with random matrices.

Read the rest of this entry »

The Riemann zeta function {\zeta(s)}, defined for {\hbox{Re}(s)>1} by

\displaystyle  \zeta(s) := \sum_{n=1}^\infty \frac{1}{n^s} \ \ \ \ \ (1)

and then continued meromorphically to other values of {s} by analytic continuation, is a fundamentally important function in analytic number theory, as it is connected to the primes {p=2,3,5,\ldots} via the Euler product formula

\displaystyle  \zeta(s) = \prod_p (1 - \frac{1}{p^s})^{-1} \ \ \ \ \ (2)

(for {\hbox{Re}(s) > 1}, at least), where {p} ranges over primes. (The equivalence between (1) and (2) is essentially the generating function version of the fundamental theorem of arithmetic.) The function {\zeta} has a pole at {1} and a number of zeroes {\rho}. A formal application of the factor theorem gives

\displaystyle  \zeta(s) = \frac{1}{s-1} \prod_\rho (s-\rho) \times \ldots \ \ \ \ \ (3)

where {\rho} ranges over zeroes of {\zeta}, and we will be vague about what the {\ldots} factor is, how to make sense of the infinite product, and exactly which zeroes of {\zeta} are involved in the product. Equating (2) and (3) and taking logarithms gives the formal identity

\displaystyle  - \log \zeta(s) = \sum_p \log(1 - \frac{1}{p^s}) = \log(s-1) - \sum_\rho \log(s-\rho) + \ldots; \ \ \ \ \ (4)

using the Taylor expansion

\displaystyle  \log(1 - \frac{1}{p^s}) = - \frac{1}{p^s} - \frac{1}{2 p^{2s}} - \frac{1}{3p^{3s}} - \ldots \ \ \ \ \ (5)

and differentiating the above identity in {s} yields the formal identity

\displaystyle  - \frac{\zeta'(s)}{\zeta(s)} = \sum_n \frac{\Lambda(n)}{n^s} = \frac{1}{s-1} - \sum_\rho \frac{1}{s-\rho} + \ldots \ \ \ \ \ (6)

where {\Lambda(n)} is the von Mangoldt function, defined to be {\log p} when {n} is a power of a prime {p}, and zero otherwise. Thus we see that the behaviour of the primes (as encoded by the von Mangoldt function) is intimately tied to the distribution of the zeroes {\rho}. For instance, if we knew that the zeroes were far away from the axis {\hbox{Re}(s)=1}, then we would heuristically have

\displaystyle  \sum_n \frac{\Lambda(n)}{n^{1+it}} \approx \frac{1}{it}

for real {t}. On the other hand, the integral test suggests that

\displaystyle  \sum_n \frac{1}{n^{1+it}} \approx \frac{1}{it}

and thus we see that {\frac{\Lambda(n)}{n}} and {\frac{1}{n}} have essentially the same (multiplicative) Fourier transform:

\displaystyle  \sum_n \frac{\Lambda(n)}{n^{1+it}} \approx \sum_n \frac{1}{n^{1+it}}.

Inverting the Fourier transform (or performing a contour integral closely related to the inverse Fourier transform), one is led to the prime number theorem

\displaystyle  \sum_{n \leq x} \Lambda(n) \approx \sum_{n \leq x} 1.

In fact, the standard proof of the prime number theorem basically proceeds by making all of the above formal arguments precise and rigorous.

Unfortunately, we don’t know as much about the zeroes {\rho} of the zeta function (and hence, about the {\zeta} function itself) as we would like. The Riemann hypothesis (RH) asserts that all the zeroes (except for the “trivial” zeroes at the negative even numbers) lie on the critical line {\hbox{Re}(s)=1/2}; this hypothesis would make the error terms in the above proof of the prime number theorem significantly more accurate. Furthermore, the stronger GUE hypothesis asserts in addition to RH that the local distribution of these zeroes on the critical line should behave like the local distribution of the eigenvalues of a random matrix drawn from the gaussian unitary ensemble (GUE). I will not give a precise formulation of this hypothesis here, except to say that the adjective “local” in the context of distribution of zeroes {\rho} means something like “at scale {O(1/\log T)} when {\hbox{Im}(s) = O(T)}“.

Nevertheless, we do know some reasonably non-trivial facts about the zeroes {\rho} and the zeta function {\zeta}, either unconditionally, or assuming RH (or GUE). Firstly, there are no zeroes for {\hbox{Re}(s)>1} (as one can already see from the convergence of the Euler product (2) in this case) or for {\hbox{Re}(s)=1} (this is trickier, relying on (6) and the elementary observation that

\displaystyle  \hbox{Re}( 3\frac{\Lambda(n)}{n^{\sigma}} + 4\frac{\Lambda(n)}{n^{\sigma+it}} + \frac{\Lambda(n)}{n^{\sigma+2it}} ) = 2\frac{\Lambda(n)}{n^\sigma} (1+\cos(t \log n))^2

is non-negative for {\sigma > 1} and {t \in {\mathbb R}}); from the functional equation

\displaystyle  \pi^{-s/2} \Gamma(s/2) \zeta(s) = \pi^{-(1-s)/2} \Gamma((1-s)/2) \zeta(1-s)

(which can be viewed as a consequence of the Poisson summation formula, see e.g. my blog post on this topic) we know that there are no zeroes for {\hbox{Re}(s) \leq 0} either (except for the trivial zeroes at negative even integers, corresponding to the poles of the Gamma function). Thus all the non-trivial zeroes lie in the critical strip {0 < \hbox{Re}(s) < 1}.

We also know that there are infinitely many non-trivial zeroes, and can approximately count how many zeroes there are in any large bounded region of the critical strip. For instance, for large {T}, the number of zeroes {\rho} in this strip with {\hbox{Im}(\rho) = T+O(1)} is {O(\log T)}. This can be seen by applying (6) to {s = 2+iT} (say); the trivial zeroes at the negative integers end up giving a contribution of {O(\log T)} to this sum (this is a heavily disguised variant of Stirling’s formula, as one can view the trivial zeroes as essentially being poles of the Gamma function), while the {\frac{1}{s-1}} and {\ldots} terms end up being negligible (of size {O(1)}), while each non-trivial zero {\rho} contributes a term which has a non-negative real part, and furthermore has size comparable to {1} if {\hbox{Im}(\rho) = T+O(1)}. (Here I am glossing over a technical renormalisation needed to make the infinite series in (6) converge properly.) Meanwhile, the left-hand side of (6) is absolutely convergent for {s=2+iT} and of size {O(1)}, and the claim follows. A more refined version of this argument shows that the number of non-trivial zeroes with {0 \leq \hbox{Im}(\rho) \leq T} is {\frac{T}{2\pi} \log \frac{T}{2\pi} - \frac{T}{2\pi} + O(\log T)}, but we will not need this more precise formula here. (A fair fraction – at least 40%, in fact – of these zeroes are known to lie on the critical line; see this earlier blog post of mine for more discussion.)

Another thing that we happen to know is how the magnitude {|\zeta(1/2+it)|} of the zeta function is distributed as {t \rightarrow \infty}; it turns out to be log-normally distributed with log-variance about {\frac{1}{2} \log \log t}. More precisely, we have the following result of Selberg:

Theorem 1 Let {T} be a large number, and let {t} be chosen uniformly at random from between {T} and {2T} (say). Then the distribution of {\frac{1}{\sqrt{\frac{1}{2} \log \log T}} \log |\zeta(1/2+it)|} converges (in distribution) to the normal distribution {N(0,1)}.

To put it more informally, {\log |\zeta(1/2+it)|} behaves like {\sqrt{\frac{1}{2} \log \log t} \times N(0,1)} plus lower order terms for “typical” large values of {t}. (Zeroes {\rho} of {\zeta} are, of course, certainly not typical, but one can show that one can usually stay away from these zeroes.) In fact, Selberg showed a slightly more precise result, namely that for any fixed {k \geq 1}, the {k^{th}} moment of {\frac{1}{\sqrt{\frac{1}{2} \log \log T}} \log |\zeta(1/2+it)|} converges to the {k^{th}} moment of {N(0,1)}.

Remarkably, Selberg’s result does not need RH or GUE, though it is certainly consistent with such hypotheses. (For instance, the determinant of a GUE matrix asymptotically obeys a remarkably similar log-normal law to that given by Selberg’s theorem.) Indeed, the net effect of these hypotheses only affects some error terms in {\log |\zeta(1/2+it)|} of magnitude {O(1)}, and are thus asymptotically negligible compared to the main term, which has magnitude about {O(\sqrt{\log \log T})}. So Selberg’s result, while very pretty, manages to finesse the question of what the zeroes {\rho} of {\zeta} are actually doing – he makes the primes do most of the work, rather than the zeroes.

Selberg never actually published the above result, but it is reproduced in a number of places (e.g. in this book by Joyner, or this book by Laurincikas). As with many other results in analytic number theory, the actual details of the proof can get somewhat technical; but I would like to record here (partly for my own benefit) an informal sketch of some of the main ideas in the argument.

Read the rest of this entry »

Archives

RSS Google+ feed

  • An error has occurred; the feed is probably down. Try again later.
Follow

Get every new post delivered to your Inbox.

Join 3,957 other followers