You are currently browsing the category archive for the ‘paper’ category.

Ben Green and I have just uploaded to the arXiv our paper “New bounds for Szemeredi’s theorem, Ia: Progressions of length 4 in finite field geometries revisited“, submitted to Proc. Lond. Math. Soc.. This is both an erratum to, and a replacement for, our previous paper “New bounds for Szemeredi’s theorem. I. Progressions of length 4 in finite field geometries“. The main objective in both papers is to bound the quantity ${r_4(F^n)}$ for a vector space ${F^n}$ over a finite field ${F}$ of characteristic greater than ${4}$, where ${r_4(F^n)}$ is defined as the cardinality of the largest subset of ${F^n}$ that does not contain an arithmetic progression of length ${4}$. In our earlier paper, we gave two arguments that bounded ${r_4(F^n)}$ in the regime when the field ${F}$ was fixed and ${n}$ was large. The first “cheap” argument gave the bound

$\displaystyle r_4(F^n) \ll |F|^n \exp( - c \sqrt{\log n} )$

and the more complicated “expensive” argument gave the improvement

$\displaystyle r_4(F^n) \ll |F|^n n^{-c} \ \ \ \ \ (1)$

for some constant ${c>0}$ depending only on ${F}$.

Unfortunately, while the cheap argument is correct, we discovered a subtle but serious gap in our expensive argument in the original paper. Roughly speaking, the strategy in that argument is to employ the density increment method: one begins with a large subset ${A}$ of ${F^n}$ that has no arithmetic progressions of length ${4}$, and seeks to locate a subspace on which ${A}$ has a significantly increased density. Then, by using a “Koopman-von Neumann theorem”, ultimately based on an iteration of the inverse ${U^3}$ theorem of Ben and myself (and also independently by Samorodnitsky), one approximates ${A}$ by a “quadratically structured” function ${f}$, which is (locally) a combination of a bounded number of quadratic phase functions, which one can prepare to be in a certain “locally equidistributed” or “locally high rank” form. (It is this reduction to the high rank case that distinguishes the “expensive” argument from the “cheap” one.) Because ${A}$ has no progressions of length ${4}$, the count of progressions of length ${4}$ weighted by ${f}$ will also be small; by combining this with the theory of equidistribution of quadratic phase functions, one can then conclude that there will be a subspace on which ${f}$ has increased density.

The error in the paper was to conclude from this that the original function ${1_A}$ also had increased density on the same subspace; it turns out that the manner in which ${f}$ approximates ${1_A}$ is not strong enough to deduce this latter conclusion from the former. (One can strengthen the nature of approximation until one restores such a conclusion, but only at the price of deteriorating the quantitative bounds on ${r_4(F^n)}$ one gets at the end of the day to be worse than the cheap argument.)

After trying unsuccessfully to repair this error, we eventually found an alternate argument, based on earlier papers of ourselves and of Bergelson-Host-Kra, that avoided the density increment method entirely and ended up giving a simpler proof of a stronger result than (1), and also gives the explicit value of ${c = 2^{-22}}$ for the exponent ${c}$ in (1). In fact, it gives the following stronger result:

Theorem 1 Let ${A}$ be a subset of ${F^n}$ of density at least ${\alpha}$, and let ${\epsilon>0}$. Then there is a subspace ${W}$ of ${F^n}$ of codimension ${O( \epsilon^{-2^{20}})}$ such that the number of (possibly degenerate) progressions ${a, a+r, a+2r, a+3r}$ in ${A \cap W}$ is at least ${(\alpha^4-\epsilon)|W|^2}$.

The bound (1) is an easy consequence of this theorem after choosing ${\epsilon := \alpha^4/2}$ and removing the degenerate progressions from the conclusion of the theorem.

The main new idea is to work with a local Koopman-von Neumann theorem rather than a global one, trading a relatively weak global approximation to ${1_A}$ with a significantly stronger local approximation to ${1_A}$ on a subspace ${W}$. This is somewhat analogous to how sometimes in graph theory it is more efficient (from the point of view of quantative estimates) to work with a local version of the Szemerédi regularity lemma which gives just a single regular pair of cells, rather than attempting to regularise almost all of the cells. This local approach is well adapted to the inverse ${U^3}$ theorem we use (which also has this local aspect), and also makes the reduction to the high rank case much cleaner. At the end of the day, one ends up with a fairly large subspace ${W}$ on which ${A}$ is quite dense (of density ${\alpha-O(\epsilon)}$) and which can be well approximated by a “pure quadratic” object, namely a function of a small number of quadratic phases obeying a high rank condition. One can then exploit a special positivity property of the count of length four progressions weighted by pure quadratic objects, essentially due to Bergelson-Host-Kra, which then gives the required lower bound.

Just a quick update on my previous post on gamifying the problem-solving process in high school algebra. I had a little time to spare (on an airplane flight, of all things), so I decided to rework the mockup version of the algebra game into something a bit more structured, namely as 12 progressively difficult levels of solving a linear equation in one unknown.  (Requires either Java or Flash.)   Somewhat to my surprise, I found that one could create fairly challenging puzzles out of this simple algebra problem by carefully restricting the moves available at each level. Here is a screenshot of a typical level:

After completing each level, an icon appears which one can click on to proceed to the next level.  (There is no particular rationale, by the way, behind my choice of icons; these are basically selected arbitrarily from the default collection of icons (or more precisely, “costumes”) available in Scratch.)

The restriction of moves made the puzzles significantly more artificial in nature, but I think that this may end up ultimately being a good thing, as to solve some of the harder puzzles one is forced to really start thinking about how the process of solving for an unknown actually works. (One could imagine that if one decided to make a fully fledged game out of this, one could have several modes of play, ranging from a puzzle mode in which one solves some carefully constructed, but artificial, puzzles, to a free-form mode in which one can solve arbitrary equations (including ones that you input yourself) using the full set of available algebraic moves.)

One advantage to gamifying linear algebra, as opposed to other types of algebra, is that there is no need for disjunction (i.e. splitting into cases). In contrast, if one has to solve a problem which involves at least one quadratic equation, then at some point one may be forced to divide the analysis into two disjoint cases, depending on which branch of a square root one is taking. I am not sure how to gamify this sort of branching in a civilised manner, and would be interested to hear of any suggestions in this regard. (A similar problem also arises in proving propositions in Euclidean geometry, which I had thought would be another good test case for gamification, because of the need to branch depending on the order of various points on a line, or rays through a point, or whether two lines are parallel or intersect.)

I recently finished the first draft of the the first of my books, entitled “Hilbert’s fifth problem and related topics“, based on the lecture notes for my graduate course of the same name.    The PDF of this draft is available here.  As always, comments and corrections are welcome.

I’ve just uploaded to the arXiv my paper The asymptotic distribution of a single eigenvalue gap of a Wigner matrix, submitted to Probability Theory and Related Fields. This paper (like several of my previous papers) is concerned with the asymptotic distribution of the eigenvalues ${\lambda_1(M_n) \leq \ldots \leq \lambda_n(M_n)}$ of a random Wigner matrix ${M_n}$ in the limit ${n \rightarrow \infty}$, with a particular focus on matrices drawn from the Gaussian Unitary Ensemble (GUE). This paper is focused on the bulk of the spectrum, i.e. to eigenvalues ${\lambda_i(M_n)}$ with ${\delta n \leq i \leq (1-\delta) i n}$ for some fixed ${\delta>0}$.

The location of an individual eigenvalue ${\lambda_i(M_n)}$ is by now quite well understood. If we normalise the entries of the matrix ${M_n}$ to have mean zero and variance ${1}$, then in the asymptotic limit ${n \rightarrow \infty}$, the Wigner semicircle law tells us that with probability ${1-o(1)}$ one has

$\displaystyle \lambda_i(M_n) =\sqrt{n} u + o(\sqrt{n})$

where the classical location ${u = u_{i/n} \in [-2,2]}$ of the eigenvalue is given by the formula

$\displaystyle \int_{-2}^{u} \rho_{sc}(x)\ dx = \frac{i}{n}$

and the semicircular distribution ${\rho_{sc}(x)\ dx}$ is given by the formula

$\displaystyle \rho_{sc}(x) := \frac{1}{2\pi} (4-x^2)_+^{1/2}.$

Actually, one can improve the error term here from ${o(\sqrt{n})}$ to ${O( \log^{1/2+\epsilon} n)}$ for any ${\epsilon>0}$ (see this previous recent paper of Van and myself for more discussion of these sorts of estimates, sometimes known as eigenvalue rigidity estimates).

From the semicircle law (and the fundamental theorem of calculus), one expects the ${i^{th}}$ eigenvalue spacing ${\lambda_{i+1}(M_n)-\lambda_i(M_n)}$ to have an average size of ${\frac{1}{\sqrt{n} \rho_{sc}(u)}}$. It is thus natural to introduce the normalised eigenvalue spacing

$\displaystyle X_i := \frac{\lambda_{i+1}(M_n) - \lambda_i(M_n)}{1/\sqrt{n} \rho_{sc}(u)}$

and ask what the distribution of ${X_i}$ is.

As mentioned previously, we will focus on the bulk case ${\delta n \leq i\leq (1-\delta)n}$, and begin with the model case when ${M_n}$ is drawn from GUE. (In the edge case when ${i}$ is close to ${1}$ or to ${n}$, the distribution is given by the famous Tracy-Widom law.) Here, the distribution was almost (but as we shall see, not quite) worked out by Gaudin and Mehta. By using the theory of determinantal processes, they were able to compute a quantity closely related to ${X_i}$, namely the probability

$\displaystyle {\bf P}( N_{[\sqrt{n} u + \frac{x}{\sqrt{n} \rho_{sc}(u)}, \sqrt{n} u + \frac{y}{\sqrt{n} \rho_{sc}(u)}]} = 0) \ \ \ \ \ (1)$

that an interval ${[\sqrt{n} u + \frac{x}{\sqrt{n} \rho_{sc}(u)}, \sqrt{n} u + \frac{y}{\sqrt{n} \rho_{sc}(u)}]}$ near ${\sqrt{n} u}$ of length comparable to the expected eigenvalue spacing ${1/\sqrt{n} \rho_{sc}(u)}$ is devoid of eigenvalues. For ${u}$ in the bulk and fixed ${x,y}$, they showed that this probability is equal to

$\displaystyle \det( 1 - 1_{[x,y]} P 1_{[x,y]} ) + o(1),$

where ${P}$ is the Dyson projection

$\displaystyle P f(x) = \int_{\bf R} \frac{\sin(\pi(x-y))}{\pi(x-y)} f(y)\ dy$

to Fourier modes in ${[-1/2,1/2]}$, and ${\det}$ is the Fredholm determinant. As shown by Jimbo, Miwa, Tetsuji, Mori, and Sato, this determinant can also be expressed in terms of a solution to a Painleve V ODE, though we will not need this fact here. In view of this asymptotic and some standard integration by parts manipulations, it becomes plausible to propose that ${X_i}$ will be asymptotically distributed according to the Gaudin-Mehta distribution ${p(x)\ dx}$, where

$\displaystyle p(x) := \frac{d^2}{dx^2} \det( 1 - 1_{[0,x]} P 1_{[0,x]} ).$

A reasonably accurate approximation for ${p}$ is given by the Wigner surmise ${p(x) \approx \frac{1}{2} \pi x e^{-\pi x^2/4}}$, which was presciently proposed by Wigner as early as 1957; it is exact for ${n=2}$ but not in the asymptotic limit ${n \rightarrow \infty}$.

Unfortunately, when one tries to make this argument rigorous, one finds that the asymptotic for (1) does not control a single gap ${X_i}$, but rather an ensemble of gaps ${X_i}$, where ${i}$ is drawn from an interval ${[i_0 - L, i_0 + L]}$ of some moderate size ${L}$ (e.g. ${L = \log n}$); see for instance this paper of Deift, Kriecherbauer, McLaughlin, Venakides, and Zhou for a more precise formalisation of this statement (which is phrased slightly differently, in which one samples all gaps inside a fixed window of spectrum, rather than inside a fixed range of eigenvalue indices ${i}$). (This result is stated for GUE, but can be extended to other Wigner ensembles by the Four Moment Theorem, at least if one assumes a moment matching condition; see this previous paper with Van Vu for details. The moment condition can in fact be removed, as was done in this subsequent paper with Erdos, Ramirez, Schlein, Vu, and Yau.)

The problem is that when one specifies a given window of spectrum such as ${[\sqrt{n} u + \frac{x}{\sqrt{n} \rho_{sc}(u)}, \sqrt{n} u + \frac{y}{\sqrt{n} \rho_{sc}(u)}]}$, one cannot quite pin down in advance which eigenvalues ${\lambda_i(M_n)}$ are going to lie to the left or right of this window; even with the strongest eigenvalue rigidity results available, there is a natural uncertainty of ${\sqrt{\log n}}$ or so in the ${i}$ index (as can be quantified quite precisely by this central limit theorem of Gustavsson).

The main difficulty here is that there could potentially be some strange coupling between the event (1) of an interval being devoid of eigenvalues, and the number ${N_{(-\infty,\sqrt{n} u + \frac{x}{\sqrt{n} \rho_{sc}(u)})}(M_n)}$ of eigenvalues to the left of that interval. For instance, one could conceive of a possible scenario in which the interval in (1) tends to have many eigenvalues when ${N_{(-\infty,\sqrt{n} u + \frac{x}{\sqrt{n} \rho_{sc}(u)})}(M_n)}$ is even, but very few when ${N_{(-\infty,\sqrt{n} u + \frac{x}{\sqrt{n} \rho_{sc}(u)})}(M_n)}$ is odd. In this sort of situation, the gaps ${X_i}$ may have different behaviour for even ${i}$ than for odd ${i}$, and such anomalies would not be picked up in the averaged statistics in which ${i}$ is allowed to range over some moderately large interval.

The main result of the current paper is that these anomalies do not actually occur, and that all of the eigenvalue gaps ${X_i}$ in the bulk are asymptotically governed by the Gaudin-Mehta law without the need for averaging in the ${i}$ parameter. Again, this is shown first for GUE, and then extended to other Wigner matrices obeying a matching moment condition using the Four Moment Theorem. (It is likely that the moment matching condition can be removed here, but I was unable to achieve this, despite all the recent advances in establishing universality of local spectral statistics for Wigner matrices, mainly because the universality results in the literature are more focused on specific energy levels ${u}$ than on specific eigenvalue indices ${i}$. To make matters worse, in some cases universality is currently known only after an additional averaging in the energy parameter.)

The main task in the proof is to show that the random variable ${N_{(-\infty,\sqrt{n} u + \frac{x}{\sqrt{n} \rho_{sc}(u)})}(M_n)}$ is largely decoupled from the event in (1) when ${M_n}$ is drawn from GUE. To do this we use some of the theory of determinantal processes, and in particular the nice fact that when one conditions a determinantal process to the event that a certain spatial region (such as an interval) contains no points of the process, then one obtains a new determinantal process (with a kernel that is closely related to the original kernel). The main task is then to obtain a sufficiently good control on the distance between the new determinantal kernel and the old one, which we do by some functional-analytic considerations involving the manipulation of norms of operators (and specifically, the operator norm, Hilbert-Schmidt norm, and nuclear norm). Amusingly, the Fredholm alternative makes a key appearance, as I end up having to invert a compact perturbation of the identity at one point (specifically, I need to invert ${1 - 1_{[x,y]}P1_{[x,y]}}$, where ${P}$ is the Dyson projection and ${[x,y]}$ is an interval). As such, the bounds in my paper become ineffective, though I am sure that with more work one can invert this particular perturbation of the identity by hand, without the need to invoke the Fredholm alternative.

Van Vu and I have just uploaded to the arXiv our paper “Random matrices: The Universality phenomenon for Wigner ensembles“. This survey is a longer version (58 pages) of a previous short survey we wrote up a few months ago. The survey focuses on recent progress in understanding the universality phenomenon for Hermitian Wigner ensembles, of which the Gaussian Unitary Ensemble (GUE) is the most well known. The one-sentence summary of this progress is that many of the asymptotic spectral statistics (e.g. correlation functions, eigenvalue gaps, determinants, etc.) that were previously known for GUE matrices, are now known for very large classes of Wigner ensembles as well. There are however a wide variety of results of this type, due to the large number of interesting spectral statistics, the varying hypotheses placed on the ensemble, and the different modes of convergence studied, and it is difficult to isolate a single such result currently as the definitive universality result. (In particular, there is at present a tradeoff between generality of ensemble and strength of convergence; the universality results that are available for the most general classes of ensemble are only presently able to demonstrate a rather weak sense of convergence to the universal distribution (involving an additional averaging in the energy parameter), which limits the applicability of such results to a number of interesting questions in which energy averaging is not permissible, such as the study of the least singular value of a Wigner matrix, or of related quantities such as the condition number or determinant. But it is conceivable that this tradeoff is a temporary phenomenon and may be eliminated by future work in this area; in the case of Hermitian matrices whose entries have the same second moments as that of the GUE ensemble, for instance, the need for energy averaging has already been removed.)

Nevertheless, throughout the family of results that have been obtained recently, there are two main methods which have been fundamental to almost all of the recent progress in extending from special ensembles such as GUE to general ensembles. The first method, developed extensively by Erdos, Schlein, Yau, Yin, and others (and building on an initial breakthrough by Johansson), is the heat flow method, which exploits the rapid convergence to equilibrium of the spectral statistics of matrices undergoing Dyson-type flows towards GUE. (An important aspect to this method is the ability to accelerate the convergence to equilibrium by localising the Hamiltonian, in order to eliminate the slowest modes of the flow; this refinement of the method is known as the “local relaxation flow” method. Unfortunately, the translation mode is not accelerated by this process, which is the principal reason why results obtained by pure heat flow methods still require an energy averaging in the final conclusion; it would of interest to find a way around this difficulty.) The other method, which goes all the way back to Lindeberg in his classical proof of the central limit theorem, and which was introduced to random matrix theory by Chatterjee and then developed for the universality problem by Van Vu and myself, is the swapping method, which is based on the observation that spectral statistics of Wigner matrices tend to be stable if one replaces just one or two entries of the matrix with another distribution, with the stability of the swapping process becoming stronger if one assumes that the old and new entries have many matching moments. The main formalisations of this observation are known as four moment theorems, because they require four matching moments between the entries, although there are some variant three moment theorems and two moment theorems in the literature as well. Our initial four moment theorems were focused on individual eigenvalues (and later also to eigenvectors), but it was later observed by Erdos, Yau, and Yin that simpler four moment theorems could also be established for aggregate spectral statistics, such as the coefficients of the Greens function, and Knowles and Yin also subsequently observed that these latter theorems could be used to recover a four moment theorem for eigenvalues and eigenvectors, giving an alternate approach to proving such theorems.

Interestingly, it seems that the heat flow and swapping methods are complementary to each other; the heat flow methods are good at removing moment hypotheses on the coefficients, while the swapping methods are good at removing regularity hypotheses. To handle general ensembles with minimal moment or regularity hypotheses, it is thus necessary to combine the two methods (though perhaps in the future a third method, or a unification of the two existing methods, might emerge).

Besides the heat flow and swapping methods, there are also a number of other basic tools that are also needed in these results, such as local semicircle laws and eigenvalue rigidity, which are also discussed in the survey. We also survey how universality has been established for wide variety of spectral statistics; the ${k}$-point correlation functions are the most well known of these statistics, but they do not tell the whole story (particularly if one can only control these functions after an averaging in the energy), and there are a number of other statistics, such as eigenvalue counting functions, determinants, or spectral gaps, for which the above methods can be applied.

In order to prevent the survey from becoming too enormous, we decided to restrict attention to Hermitian matrix ensembles, whose entries off the diagonal are identically distributed, as this is the case in which the strongest results are available. There are several results that are applicable to more general ensembles than these which are briefly mentioned in the survey, but they are not covered in detail.

We plan to submit this survey eventually to the proceedings of a workshop on random matrix theory, and will continue to update the references on the arXiv version until the time comes to actually submit the paper.

Finally, in the survey we issue some errata for previous papers of Van and myself in this area, mostly centering around the three moment theorem (a variant of the more widely used four moment theorem), for which the original proof of Van and myself was incomplete. (Fortunately, as the three moment theorem had many fewer applications than the four moment theorem, and most of the applications that it did have ended up being superseded by subsequent papers, the actual impact of this issue was limited, but still an erratum is in order.)

I’ve just uploaded to the arXiv my paper “Every odd number greater than 1 is the sum of at most five primes“, submitted to Mathematics of Computation. The main result of the paper is as stated in the title, and is in the spirit of (though significantly weaker than) the even Goldbach conjecture (every even natural number is the sum of at most two primes) and odd Goldbach conjecture (every odd natural number greater than 1 is the sum of at most three primes). It also improves on a result of Ramaré that every even natural number is the sum of at most six primes. This result had previously also been established by Kaniecki under the additional assumption of the Riemann hypothesis, so one can view the main result here as an unconditional version of Kaniecki’s result.

The method used is the Hardy-Littlewood circle method, which was for instance also used to prove Vinogradov’s theorem that every sufficiently large odd number is the sum of three primes. Let’s quickly recall how this argument works. It is convenient to use a proxy for the primes, such as the von Mangoldt function ${\Lambda}$, which is mostly supported on the primes. To represent a large number ${x}$ as the sum of three primes, it suffices to obtain a good lower bound for the sum

$\displaystyle \sum_{n_1,n_2,n_3: n_1+n_2+n_3=x} \Lambda(n_1) \Lambda(n_2) \Lambda(n_3).$

By Fourier analysis, one can rewrite this sum as an integral

$\displaystyle \int_{{\bf R}/{\bf Z}} S(x,\alpha)^3 e(-x\alpha)\ d\alpha$

where

$\displaystyle S(x,\alpha) := \sum_{n \leq x} \Lambda(n) e(n\alpha)$

and ${e(\theta) :=e^{2\pi i \theta}}$. To control this integral, one then needs good bounds on ${S(x,\alpha)}$ for various values of ${\alpha}$. To do this, one first approximates ${\alpha}$ by a rational ${a/q}$ with controlled denominator (using a tool such as the Dirichlet approximation theorem) ${q}$. The analysis then broadly bifurcates into the major arc case when ${q}$ is small, and the minor arc case when ${q}$ is large. In the major arc case, the problem more or less boils down to understanding sums such as

$\displaystyle \sum_{n\leq x} \Lambda(n) e(an/q),$

which in turn is almost equivalent to understanding the prime number theorem in arithmetic progressions modulo ${q}$. In the minor arc case, the prime number theorem is not strong enough to give good bounds (unless one is using some extremely strong hypotheses, such as the generalised Riemann hypothesis), so instead one uses a rather different method, using truncated versions of divisor sum identities such as ${\Lambda(n) =\sum_{d|n} \mu(d) \log\frac{n}{d}}$ to split ${S(x,\alpha)}$ into a collection of linear and bilinear sums that are more tractable to bound, typical examples of which (after using a particularly simple truncated divisor sum identity known as Vaughan’s identity) include the “Type I sum”

$\displaystyle \sum_{d \leq U} \mu(d) \sum_{n \leq x/d} \log(n) e(\alpha dn)$

and the “Type II sum”

$\displaystyle \sum_{d > U} \sum_{w > V} \mu(d) (\sum_{b|w: b > V} \Lambda(b)) e(\alpha dw) 1_{dw \leq x}.$

After using tools such as the triangle inequality or Cauchy-Schwarz inequality to eliminate arithmetic functions such as ${\mu(d)}$ or ${\sum_{b|w: b>V}\Lambda(b)}$, one ends up controlling plain exponential sums such as ${\sum_{V < w < x/d} e(\alpha dw)}$, which can be efficiently controlled in the minor arc case.

This argument works well when ${x}$ is extremely large, but starts running into problems for moderate sized ${x}$, e.g. ${x \sim 10^{30}}$. The first issue is that of logarithmic losses in the minor arc estimates. A typical minor arc estimate takes the shape

$\displaystyle |S(x,\alpha)| \ll (\frac{x}{\sqrt{q}}+\frac{x}{\sqrt{x/q}} + x^{4/5}) \log^3 x \ \ \ \ \ (1)$

when ${\alpha}$ is close to ${a/q}$ for some ${1\leq q\leq x}$. This only improves upon the trivial estimate ${|S(x,\alpha)| \ll x}$ from the prime number theorem when ${\log^6 x \ll q \ll x/\log^6 x}$. As a consequence, it becomes necessary to obtain an accurate prime number theorem in arithmetic progressions with modulus as large as ${\log^6 x}$. However, with current technology, the error term in such theorems are quite poor (terms such as ${O(\exp(-c\sqrt{\log x}) x)}$ for some small ${c>0}$ are typical, and there is also a notorious “Siegel zero” problem), and as a consequence, the method is generally only applicable for very large ${x}$. For instance, the best explicit result of Vinogradov type known currently is due to Liu and Wang, who established that all odd numbers larger than ${10^{1340}}$ are the sum of three odd primes. (However, on the assumption of the GRH, the full odd Goldbach conjecture is known to be true; this is a result of Deshouillers, Effinger, te Riele, and Zinoviev.)

In this paper, we make a number of refinements to the general scheme, each one of which is individually rather modest and not all that novel, but which when added together turn out to be enough to resolve the five primes problem (though many more ideas would still be needed to tackle the three primes problem, and as is well known the circle method is very unlikely to be the route to make progress on the two primes problem). The first refinement, which is only available in the five primes case, is to take advantage of the numerical verification of the even Goldbach conjecture up to some large ${N_0}$ (we take ${N_0=4\times 10^{14}}$, using a verification of Richstein, although there are now much larger values of ${N_0}$as high as ${2.6 \times 10^{18}}$ – for which the conjecture has been verified). As such, instead of trying to represent an odd number ${x}$ as the sum of five primes, we can represent it as the sum of three odd primes and a natural number between ${2}$ and ${N_0}$. This effectively brings us back to the three primes problem, but with the significant additional boost that one can essentially restrict the frequency variable ${\alpha}$ to be of size ${O(1/N_0)}$. In practice, this eliminates all of the major arcs except for the principal arc around ${0}$. This is a significant simplification, in particular avoiding the need to deal with the prime number theorem in arithmetic progressions (and all the attendant theory of L-functions, Siegel zeroes, etc.).

In a similar spirit, by taking advantage of the numerical verification of the Riemann hypothesis up to some height ${T_0}$, and using the explicit formula relating the von Mangoldt function with the zeroes of the zeta function, one can safely deal with the principal major arc ${\{ \alpha = O( T_0 / x ) \}}$. For our specific application, we use the value ${T_0= 3.29 \times 10^9}$, arising from the verification of the Riemann hypothesis of the first ${10^{10}}$ zeroes by van de Lune (unpublished) and Wedeniswki. (Such verifications have since been extended further, the latest being that the first ${10^{13}}$ zeroes lie on the line.)

To make the contribution of the major arc as efficient as possible, we borrow an idea from a paper of Bourgain, and restrict one of the three primes in the three-primes problem to a somewhat shorter range than the other two (of size ${O(x/K)}$ instead of ${O(x)}$, where we take ${K}$ to be something like ${10^3}$), as this largely eliminates the “Archimedean” losses coming from trying to use Fourier methods to control convolutions on ${{\bf R}}$. In our paper, we set the scale parameter ${K}$ to be ${10^3}$ (basically, anything that is much larger than ${1}$ but much less than ${T_0}$ will work), but we found that an additional gain (which we ended up not using) could be obtained by averaging ${K}$ over a range of scales, say between ${10^3}$ and ${10^6}$. This sort of averaging could be a useful trick in future work on Goldbach-type problems.

It remains to treat the contribution of the “minor arc” ${T_0/x \ll |\alpha| \ll 1/N_0}$. To do this, one needs good ${L^2}$ and ${L^\infty}$ type estimates on the exponential sum ${S(x,\alpha)}$. Plancherel’s theorem gives an ${L^2}$ estimate which loses a logarithmic factor, but it turns out that on this particular minor arc one can use tools from the theory of the large sieve (such as Montgomery’s uncertainty principle) to eliminate this logarithmic loss almost completely; it turns out that the most efficient way to do this is use an effective upper bound of Siebert on the number of prime pairs ${(p,p+h)}$ less than ${x}$ to obtain an ${L^2}$ bound that only loses a factor of ${8}$ (or of ${7}$, once one cuts out the major arc).

For ${L^\infty}$ estimates, it turns out that existing effective versions of (1) (in particular, the bound given by Chen and Wang) are insufficient, due to the three logarithmic factors of ${\log x}$ in the bound. By using a smoothed out version ${S_\eta(x,\alpha) :=\sum_{n}\Lambda(n) e(n\alpha) \eta(n/x)}$ of the sum ${S(\alpha,x)}$, for some suitable cutoff function ${\eta}$, one can save one factor of a logarithm, obtaining a bound of the form

$\displaystyle |S_\eta(x,\alpha)| \ll (\frac{x}{\sqrt{q}}+\frac{x}{\sqrt{x/q}} + x^{4/5}) \log^2 x$

with effective constants. One can improve the constants further by restricting all summations to odd integers (which barely affects ${S_\eta(x,\alpha)}$, since ${\Lambda}$ was mostly supported on odd numbers anyway), which in practice reduces the effective constants by a factor of two or so. One can also make further improvements in the constants by using the very sharp large sieve inequality to control the “Type II” sums that arise from Vaughan’s identity, and by using integration by parts to improve the bounds on the “Type I” sums. A final gain can then be extracted by optimising the cutoff parameters ${U, V}$ appearing in Vaughan’s identity to minimise the contribution of the Type II sums (which, in practice, are the dominant term). Combining all these improvements, one ends up with bounds of the shape

$\displaystyle |S_\eta(x,\alpha)| \ll \frac{x}{q} \log^2 x + \frac{x}{\sqrt{q}} \log^2 q$

when ${q}$ is small (say ${1 < q < x^{1/3}}$) and

$\displaystyle |S_\eta(x,\alpha)| \ll \frac{x}{(x/q)^2} \log^2 x + \frac{x}{\sqrt{x/q}} \log^2(x/q)$

when ${q}$ is large (say ${x^{2/3} < q < x}$). (See the paper for more explicit versions of these estimates.) The point here is that the ${\log x}$ factors have been partially replaced by smaller logarithmic factors such as ${\log q}$ or ${\log x/q}$. Putting together all of these improvements, one can finally obtain a satisfactory bound on the minor arc. (There are still some terms with a ${\log x}$ factor in them, but we use the effective Vinogradov theorem of Liu and Wang to upper bound ${\log x}$ by ${3100}$, which ends up making the remaining terms involving ${\log x}$ manageable.)

Van Vu and I have just uploaded to the arXiv our paper Random matrices: Sharp concentration of eigenvalues, submitted to the Electronic Journal of Probability. As with many of our previous papers, this paper is concerned with the distribution of the eigenvalues ${\lambda_1(M_n) \leq \ldots \leq \lambda_n(M_n)}$ of a random Wigner matrix ${M_n}$ (such as a matrix drawn from the Gaussian Unitary Ensemble (GUE) or Gaussian Orthogonal Ensemble (GOE)). To simplify the discussion we shall mostly restrict attention to the bulk of the spectrum, i.e. to eigenvalues ${\lambda_i(M_n)}$ with ${\delta n \leq i \leq (1-\delta) i n}$ for some fixed ${\delta>0}$, although analogues of most of the results below have also been obtained at the edge of the spectrum.

If we normalise the entries of the matrix ${M_n}$ to have mean zero and variance ${1/n}$, then in the asymptotic limit ${n \rightarrow \infty}$, we have the Wigner semicircle law, which asserts that the eigenvalues are asymptotically distributed according to the semicircular distribution ${\rho_{sc}(x)\ dx}$, where

$\displaystyle \rho_{sc}(x) := \frac{1}{2\pi} (4-x^2)_+^{1/2}.$

An essentially equivalent way of saying this is that for large ${n}$, we expect the ${i^{th}}$ eigenvalue ${\lambda_i(M_n)}$ of ${M_n}$ to stay close to the classical location ${\gamma_i \in [-2,2]}$, defined by the formula

$\displaystyle \int_{-2}^{\gamma_i} \rho_{sc}(x)\ dx = \frac{i}{n}.$

In particular, from the Wigner semicircle law it can be shown that asymptotically almost surely, one has

$\displaystyle \lambda_i(M_n) = \gamma_i + o(1) \ \ \ \ \ (1)$

for all ${1 \leq i \leq n}$.

In the modern study of the spectrum of Wigner matrices (and in particular as a key tool in establishing universality results), it has become of interest to improve the error term in (1) as much as possible. A typical early result in this direction was by Bai, who used the Stieltjes transform method to obtain polynomial convergence rates of the shape ${O(n^{-c})}$ for some absolute constant ${c>0}$; see also the subsequent papers of Alon-Krivelevich-Vu and of of Meckes, who were able to obtain such convergence rates (with exponentially high probability) by using concentration of measure tools, such as Talagrand’s inequality. On the other hand, in the case of the GUE ensemble it is known (by this paper of Gustavsson) that ${\lambda_i(M_n)}$ has variance comparable to ${\frac{\log n}{n^2}}$ in the bulk, so that the optimal error term in (1) should be about ${O(\log^{1/2} n/n)}$. (One may think that if one wanted bounds on (1) that were uniform in ${i}$, one would need to enlarge the error term further, but this does not appear to be the case, due to strong correlations between the ${\lambda_i}$; note for instance this recent result of Ben Arous and Bourgarde that the largest gap between eigenvalues in the bulk is typically of order ${O(\log^{1/2} n/n)}$.)

A significant advance in this direction was achieved by Erdos, Schlein, and Yau in a series of papers where they used a combination of Stieltjes transform and concentration of measure methods to obtain local semicircle laws which showed, among other things, that one had asymptotics of the form

$\displaystyle N(I) = (1+o(1)) \int_I \rho_{sc}(x)\ dx$

with exponentially high probability for intervals ${I}$ in the bulk that were as short as ${n^{-1+\epsilon}}$ for some ${\epsilon>0}$, where ${N(I)}$ is the number of eigenvalues. These asymptotics are consistent with a good error term in (1), and are already sufficient for many applications, but do not quite imply a strong concentration result for individual eigenvalues ${\lambda_i}$ (basically because they do not preclude long-range or “secular” shifts in the spectrum that involve large blocks of eigenvalues at mesoscopic scales). Nevertheless, this was rectified in a subsequent paper of Erdos, Yau, and Yin, which roughly speaking obtained a bound of the form

$\displaystyle \lambda_i(M_n) = \gamma_i + O( \frac{\log^{O(\log\log n)} n}{n} )$

in the bulk with exponentially high probability, for Wigner matrices obeying some exponential decay conditions on the entries. This was achieved by a rather delicate high moment calculation, in which the contribution of the diagonal entries of the resolvent (whose average forms the Stieltjes transform) was shown to mostly cancel each other out.

As the GUE computations show, this concentration result is sharp up to the quasilogarithmic factor ${\log^{O(\log\log n)} n}$. The main result of this paper is to improve the concentration result to one more in line with the GUE case, namely

$\displaystyle \lambda_i(M_n) = \gamma_i + O( \frac{\log^{O(1)} n}{n} )$

with exponentially high probability (see the paper for a more precise statement of results). The one catch is that an additional hypothesis is required, namely that the entries of the Wigner matrix have vanishing third moment. We also obtain similar results for the edge of the spectrum (but with a different scaling).

Our arguments are rather different from those of Erdos, Yau, and Yin, and thus provide an alternate approach to establishing eigenvalue concentration. The main tool is the Lindeberg exchange strategy, which is also used to prove the Four Moment Theorem (although we do not directly invoke the Four Moment Theorem in our analysis). The main novelty is that this exchange strategy is now used to establish large deviation estimates (i.e. exponentially small tail probabilities) rather than universality of the limiting distribution. Roughly speaking, the basic point is as follows. The Lindeberg exchange strategy seeks to compare a function ${F(X_1,\ldots,X_n)}$ of many independent random variables ${X_1,\ldots,X_n}$ with the same function ${F(Y_1,\ldots,Y_n)}$ of a different set of random variables (which match moments with the original set of variables to some order, such as to second or fourth order) by exchanging the random variables one at a time. Typically, one tries to upper bound expressions such as

$\displaystyle {\bf E} \phi(F(X_1,\ldots,X_n)) - \phi(F(X_1,\ldots,X_{n-1},Y_n))$

for various smooth test functions ${\phi}$, by performing a Taylor expansion in the variable being swapped and taking advantage of the matching moment hypotheses. In previous implementations of this strategy, ${\phi}$ was a bounded test function, which allowed one to get control of the bulk of the distribution of ${F(X_1,\ldots,X_n)}$, and in particular in controlling probabilities such as

$\displaystyle {\bf P}( a \leq F(X_1,\ldots,X_n) \leq b )$

for various thresholds ${a}$ and ${b}$, but did not give good control on the tail as the error terms tended to be polynomially decaying in ${n}$ rather than exponentially decaying. However, it turns out that one can modify the exchange strategy to deal with moments such as

$\displaystyle {\bf E} (1 + F(X_1,\ldots,X_n)^2)^k$

for various moderately large ${k}$ (e.g. of size comparable to ${\log n}$), obtaining results such as

$\displaystyle {\bf E} (1 + F(Y_1,\ldots,Y_n)^2)^k = (1+o(1)) {\bf E} (1 + F(X_1,\ldots,X_n)^2)^k$

after performing all the relevant exchanges. As such, one can then use large deviation estimates on ${F(X_1,\ldots,X_n)}$ to deduce large deviation estimates on ${F(Y_1,\ldots,Y_n)}$.

In this paper we also take advantage of a simplification, first noted by Erdos, Yau, and Yin, that Four Moment Theorems become somewhat easier to prove if one works with resolvents ${(M_n-z)^{-1}}$ (and the closely related Stieltjes transform ${s(z) := \frac{1}{n} \hbox{tr}( (M_n-z)^{-1} )}$) rather than with individual eigenvalues, as the Taylor expansion of resolvents are very simple (essentially being a Neumann series). The relationship between the Stieltjes transform and the location of individual eigenvalues can be seen by taking advantage of the identity

$\displaystyle \frac{\pi}{2} - \frac{\pi}{n} N((-\infty,E)) = \int_0^\infty \hbox{Re} s(E + i \eta)\ d\eta$

for any energy level ${E \in {\bf R}}$, which can be verified from elementary calculus. (In practice, we would truncate ${\eta}$ near zero and near infinity to avoid some divergences, but this is a minor technicality.) As such, a concentration result for the Stieltjes transform can be used to establish an analogous concentration result for the eigenvalue counting functions ${N((-\infty,E))}$, which in turn can be used to deduce concentration results for individual eigenvalues ${\lambda_i(M_n)}$ by some basic combinatorial manipulations.

Emmanuel Breuillard, Ben Green and I have just uploaded to the arXiv the short paper “A nilpotent Freiman dimension lemma“, submitted to the special volume of the European Journal of Combinatorics in honour of Yahya Ould Hamidoune.  This paper is a nonabelian (or more precisely, nilpotent) variant of the following additive combinatorics lemma of Freiman:

Freiman’s lemma.  Let A be a finite subset of a Euclidean space with $|A+A| \leq K|A|$.  Then A is contained in an affine subspace of dimension at most ${}\lfloor K-1 \rfloor$.

This can be viewed as a “cheap” version of the more well known theorem of Freiman that places sets of small doubling in a torsion-free abelian group inside a generalised arithmetic progression.  The advantage here is that the bound on the dimension is extremely explicit.

Our main result is

Theorem.  Let A be a finite subset of a simply-connected nilpotent Lie group G which is a K-approximate group (i.e. A is symmetric, contains the identity, and $A \cdot A$ can be covered by up to K left translates of A.  Then A can be covered by at most $K^{2+29K^9}$ left-translates of a closed connected Lie subgroup of dimension at most $K^9$.

We remark that our previous paper established a similar result, in which the dimension bound was improved to $6\log_2 K$, but at the cost of worsening the covering number to $O_K(1)$, and with a much more complicated proof  (91 pages instead of 8). Furthermore, the bound on $O_K(1)$ is ineffective, due to the use of ultraproducts in the argument (though it is likely that some extremely lousy explicit bound could eventually be squeezed out of the argument by finitising everything).  Note that the step of the ambient nilpotent group G does not influence the final bounds in the theorem, although we do of course need this step to be finite.  A simple quotienting argument allows one to deduce a corollary of the above theorem in which the ambient group is assumed to be residually torsion-free nilpotent instead of being a simply connected nilpotent Lie group, but we omit the statement of this corollary here.

To motivate the proof of this theorem, let us first show a simple case of an argument of Gleason, which is very much in the spirit of Freiman’s lemma:

Gleason Lemma (special case).  Let $A$ be a finite symmetric subset of a Euclidean space, and let $0 = H_0 \subset H_1 \subset \ldots \subset H_k$ be a sequence of subspaces in this space, such that the sets $2A \cap H_i$ are strictly increasing in i for $i=0,\ldots,k$.  Then $|5A| \geq k|A|$, where $5A = A+A+A+A+A$.

Proof.    By hypothesis, for each $i=1,\ldots,k$, the projection $B_i$ of $2A \cap H_i$ to $H_i / H_{i-1}$ is non-trivial, finite, and symmetric.   In particular, since the vector space $H_i/H_{i-1}$ is torsion-free, $B_i+B_i$ is strictly larger than $B_i$.  Equivalently, one can find $a_i$ in $(2A \cap H_i) + (2A \cap H_i)$ that does not lie in $(2A \cap H_i) + H_{i-1}$; in particular, $a_i \in 4A \cap H_i$ and $a_i+A$ is disjoint from $H_{i-1}+A$.  As a consequence, the $a_i+A$ are disjoint and lie in 5A, whence the claim. $\Box$

Note that by combining the contrapositive of this lemma with a greedy algorithm, one can show that any K-approximate group in a Euclidean space is contained in a subspace of dimension at most $K^4$, which is a weak version of Freiman’s lemma.

To extend the argument to the nilpotent setting we use the following idea.  Observe that any non-trivial genuine subgroup H of a nilpotent group G will contain at least one non-trivial central element; indeed, by intersecting H with the lower central series $G = G_1 \geq G_2 \geq G_3 \geq \ldots$ of G, and considering the last intersection $H \cap G_k$ which is non-trivial, one obtains the claim.  It turns out that one can adapt this argument to approximate groups, so that any sufficiently large K-approximate subgroup A of G will contain a non-trivial element that centralises a large fraction of A.  Passing to this large fraction and quotienting out the central element, we obtain a new approximate group.    If, after a bounded number of steps, this procedure gives an approximate group of bounded size, we are basically done.  If, however, the process continues, then by using some Lie group theory, one can find a long sequence $H_0 \leq H_1 \leq \ldots \leq H_k$ of connected Lie subgroups of G, such that the sets $A^2 \cap H_i$ are strictly increasing in i.   Using some Lie group theory and the hypotheses on G, one can deduce that the group $\langle A^2 \cap H_{i+1}\rangle$ generated by $A^2 \cap H_{i+1}$ is much larger than $\langle A^2 \cap H_i \rangle$, in the sense that the latter group has infinite index in the former.   It then turns out that the Gleason argument mentioned above can be adapted to this setting.

Van Vu and I have just uploaded to the arXiv our short survey article, “Random matrices: The Four Moment Theorem for Wigner ensembles“, submitted to the MSRI book series, as part of the proceedings on the MSRI semester program on random matrix theory from last year.  This is a highly condensed version (at 17 pages) of a much longer survey (currently at about 48 pages, though not completely finished) that we are currently working on, devoted to the recent advances in understanding the universality phenomenon for spectral statistics of Wigner matrices.  In this abridged version of the survey, we focus on a key tool in the subject, namely the Four Moment Theorem which roughly speaking asserts that the statistics of a Wigner matrix depend only on the first four moments of the entries.  We give a sketch of proof of this theorem, and two sample applications: a central limit theorem for individual eigenvalues of a Wigner matrix (extending a result of Gustavsson in the case of GUE), and the verification of a conjecture of Wigner, Dyson, and Mehta on the universality of the asymptotic k-point correlation functions even for discrete ensembles (provided that we interpret convergence in the vague topology sense).

For reasons of space, this paper is very far from an exhaustive survey even of the narrow topic of universality for Wigner matrices, but should hopefully be an accessible entry point into the subject nevertheless.

Van Vu and I have just uploaded to the arXiv our paper A central limit theorem for the determinant of a Wigner matrix, submitted to Adv. Math.. It studies the asymptotic distribution of the determinant ${\det M_n}$ of a random Wigner matrix (such as a matrix drawn from the Gaussian Unitary Ensemble (GUE) or Gaussian Orthogonal Ensemble (GOE)).

Before we get to these results, let us first discuss the simpler problem of studying the determinant ${\det A_n}$ of a random iid matrix ${A_n = (\zeta_{ij})_{1 \leq i,j \leq n}}$, such as a real gaussian matrix (where all entries are independently and identically distributed using the standard real normal distribution ${\zeta_{ij} \equiv N(0,1)_{\bf R}}$), a complex gaussian matrix (where all entries are independently and identically distributed using the standard complex normal distribution ${\zeta_{ij} \equiv N(0,1)_{\bf C}}$, thus the real and imaginary parts are independent with law ${N(0,1/2)_{\bf R}}$), or the random sign matrix (in which all entries are independently and identically distributed according to the Bernoulli distribution ${\zeta_{ij} \equiv \pm 1}$ (with a ${1/2}$ chance of either sign). More generally, one can consider a matrix ${A_n}$ in which all the entries ${\zeta_{ij}}$ are independently and identically distributed with mean zero and variance ${1}$.

We can expand ${\det A_n}$ using the Leibniz expansion

$\displaystyle \det A_n = \sum_{\sigma \in S_n} I_\sigma, \ \ \ \ \ (1)$

where ${\sigma: \{1,\ldots,n\} \rightarrow \{1,\ldots,n\}}$ ranges over the permutations of ${\{1,\ldots,n\}}$, and ${I_\sigma}$ is the product

$\displaystyle I_\sigma := \hbox{sgn}(\sigma) \prod_{i=1}^n \zeta_{i\sigma(i)}.$

From the iid nature of the ${\zeta_{ij}}$, we easily see that each ${I_\sigma}$ has mean zero and variance one, and are pairwise uncorrelated as ${\sigma}$ varies. We conclude that ${\det A_n}$ has mean zero and variance ${n!}$ (an observation first made by Turán). In particular, from Chebyshev’s inequality we see that ${\det A_n}$ is typically of size ${O(\sqrt{n!})}$.

It turns out, though, that this is not quite best possible. This is easiest to explain in the real gaussian case, by performing a computation first made by Goodman. In this case, ${\det A_n}$ is clearly symmetrical, so we can focus attention on the magnitude ${|\det A_n|}$. We can interpret this quantity geometrically as the volume of an ${n}$-dimensional parallelopiped whose generating vectors ${X_1,\ldots,X_n}$ are independent real gaussian vectors in ${{\bf R}^n}$ (i.e. their coefficients are iid with law ${N(0,1)_{\bf R}}$). Using the classical base-times-height formula, we thus have

$\displaystyle |\det A_n| = \prod_{i=1}^n \hbox{dist}(X_i, V_i) \ \ \ \ \ (2)$

where ${V_i}$ is the ${i-1}$-dimensional linear subspace of ${{\bf R}^n}$ spanned by ${X_1,\ldots,X_{i-1}}$ (note that ${X_1,\ldots,X_n}$, having an absolutely continuous joint distribution, are almost surely linearly independent). Taking logarithms, we conclude

$\displaystyle \log |\det A_n| = \sum_{i=1}^n \log \hbox{dist}(X_i, V_i).$

Now, we take advantage of a fundamental symmetry property of the Gaussian vector distribution, namely its invariance with respect to the orthogonal group ${O(n)}$. Because of this, we see that if we fix ${X_1,\ldots,X_{i-1}}$ (and thus ${V_i}$, the random variable ${\hbox{dist}(X_i,V_i)}$ has the same distribution as ${\hbox{dist}(X_i,{\bf R}^{i-1})}$, or equivalently the ${\chi}$ distribution

$\displaystyle \chi_{n-i+1} := (\sum_{j=1}^{n-i+1} \xi_{n-i+1,j}^2)^{1/2}$

where ${\xi_{n-i+1,1},\ldots,\xi_{n-i+1,n-i+1}}$ are iid copies of ${N(0,1)_{\bf R}}$. As this distribution does not depend on the ${X_1,\ldots,X_{i-1}}$, we conclude that the law of ${\log |\det A_n|}$ is given by the sum of ${n}$ independent ${\chi}$-variables:

$\displaystyle \log |\det A_n| \equiv \sum_{j=1}^{n} \log \chi_j.$

A standard computation shows that each ${\chi_j^2}$ has mean ${j}$ and variance ${2j}$, and then a Taylor series (or Ito calculus) computation (using concentration of measure tools to control tails) shows that ${\log \chi_j}$ has mean ${\frac{1}{2} \log j - \frac{1}{2j} + O(1/j^{3/2})}$ and variance ${\frac{1}{2j}+O(1/j^{3/2})}$. As such, ${\log |\det A_n|}$ has mean ${\frac{1}{2} \log n! - \frac{1}{2} \log n + O(1)}$ and variance ${\frac{1}{2} \log n + O(1)}$. Applying a suitable version of the central limit theorem, one obtains the asymptotic law

$\displaystyle \frac{\log |\det A_n| - \frac{1}{2} \log n! + \frac{1}{2} \log n}{\sqrt{\frac{1}{2}\log n}} \rightarrow N(0,1)_{\bf R}, \ \ \ \ \ (3)$

where ${\rightarrow}$ denotes convergence in distribution. A bit more informally, we have

$\displaystyle |\det A_n| \approx n^{-1/2} \sqrt{n!} \exp( N( 0, \log n / 2 )_{\bf R} ) \ \ \ \ \ (4)$

when ${A_n}$ is a real gaussian matrix; thus, for instance, the median value of ${|\det A_n|}$ is ${n^{-1/2+o(1)} \sqrt{n!}}$. At first glance, this appears to conflict with the second moment bound ${\mathop{\bf E} |\det A_n|^2 = n!}$ of Turán mentioned earlier, but once one recalls that ${\exp(N(0,t)_{\bf R})}$ has a second moment of ${\exp(2t)}$, we see that the two facts are in fact perfectly consistent; the upper tail of the normal distribution in the exponent in (4) ends up dominating the second moment.

It turns out that the central limit theorem (3) is valid for any real iid matrix with mean zero, variance one, and an exponential decay condition on the entries; this was first claimed by Girko, though the arguments in that paper appear to be incomplete. Another proof of this result, with more quantitative bounds on the convergence rate has been recently obtained by Hoi Nguyen and Van Vu. The basic idea in these arguments is to express the sum in (2) in terms of a martingale and apply the martingale central limit theorem.

If one works with complex gaussian random matrices instead of real gaussian random matrices, the above computations change slightly (one has to replace the real ${\chi}$ distribution with the complex ${\chi}$ distribution, in which the ${\xi_{i,j}}$ are distributed according to the complex gaussian ${N(0,1)_{\bf C}}$ instead of the real one). At the end of the day, one ends up with the law

$\displaystyle \frac{\log |\det A_n| - \frac{1}{2} \log n! + \frac{1}{4} \log n}{\sqrt{\frac{1}{4}\log n}} \rightarrow N(0,1)_{\bf R}, \ \ \ \ \ (5)$

$\displaystyle |\det A_n| \approx n^{-1/4} \sqrt{n!} \exp( N( 0, \log n / 4 )_{\bf R} ) \ \ \ \ \ (6)$

(but note that this new asymptotic is still consistent with Turán’s second moment calculation).

We can now turn to the results of our paper. Here, we replace the iid matrices ${A_n}$ by Wigner matrices ${M_n = (\zeta_{ij})_{1 \leq i,j \leq n}}$, which are defined similarly but are constrained to be Hermitian (or real symmetric), thus ${\zeta_{ij} = \overline{\zeta_{ji}}}$ for all ${i,j}$. Model examples here include the Gaussian Unitary Ensemble (GUE), in which ${\zeta_{ij} \equiv N(0,1)_{\bf C}}$ for ${1 \leq i < j \leq n}$ and ${\zeta_{ij} \equiv N(0,1)_{\bf R}}$ for ${1 \leq i=j \leq n}$, the Gaussian Orthogonal Ensemble (GOE), in which ${\zeta_{ij} \equiv N(0,1)_{\bf R}}$ for ${1 \leq i < j \leq n}$ and ${\zeta_{ij} \equiv N(0,2)_{\bf R}}$ for ${1 \leq i=j \leq n}$, and the symmetric Bernoulli ensemble, in which ${\zeta_{ij} \equiv \pm 1}$ for ${1 \leq i \leq j \leq n}$ (with probability ${1/2}$ of either sign). In all cases, the upper triangular entries of the matrix are assumed to be jointly independent. For a more precise definition of the Wigner matrix ensembles we are considering, see the introduction to our paper.

The determinants ${\det M_n}$ of these matrices still have a Leibniz expansion. However, in the Wigner case, the mean and variance of the ${I_\sigma}$ are slightly different, and what is worse, they are not all pairwise uncorrelated any more. For instance, the mean of ${I_\sigma}$ is still usually zero, but equals ${(-1)^{n/2}}$ in the exceptional case when ${\sigma}$ is a perfect matching (i.e. the union of exactly ${n/2}$ ${2}$-cycles, a possibility that can of course only happen when ${n}$ is even). As such, the mean ${\mathop{\bf E} \det M_n}$ still vanishes when ${n}$ is odd, but for even ${n}$ it is equal to

$\displaystyle (-1)^{n/2} \frac{n!}{(n/2)!2^{n/2}}$

(the fraction here simply being the number of perfect matchings on ${n}$ vertices). Using Stirling’s formula, one then computes that ${|\mathop{\bf E} \det A_n|}$ is comparable to ${n^{-1/4} \sqrt{n!}}$ when ${n}$ is large and even. The second moment calculation is more complicated (and uses facts about the distribution of cycles in random permutations, mentioned in this previous post), but one can compute that ${\mathop{\bf E} |\det A_n|^2}$ is comparable to ${n^{1/2} n!}$ for GUE and ${n^{3/2} n!}$ for GOE. (The discrepancy here comes from the fact that in the GOE case, ${I_\sigma}$ and ${I_\rho}$ can correlate when ${\rho}$ contains reversals of ${k}$-cycles of ${\sigma}$ for ${k \geq 3}$, but this does not happen in the GUE case.) For GUE, much more precise asymptotics for the moments of the determinant are known, starting from the work of Brezin and Hikami, though we do not need these more sophisticated computations here.

Our main results are then as follows.

Theorem 1 Let ${M_n}$ be a Wigner matrix.

• If ${M_n}$ is drawn from GUE, then

$\displaystyle \frac{\log |\det M_n| - \frac{1}{2} \log n! + \frac{1}{4} \log n}{\sqrt{\frac{1}{2}\log n}} \rightarrow N(0,1)_{\bf R}.$

• If ${M_n}$ is drawn from GOE, then

$\displaystyle \frac{\log |\det M_n| - \frac{1}{2} \log n! + \frac{1}{4} \log n}{\sqrt{\log n}} \rightarrow N(0,1)_{\bf R}.$

• The previous two results also hold for more general Wigner matrices, assuming that the real and imaginary parts are independent, a finite moment condition is satisfied, and the entries match moments with those of GOE or GUE to fourth order. (See the paper for a more precise formulation of the result.)

Thus, we informally have

$\displaystyle |\det M_n| \approx n^{-1/4} \sqrt{n!} \exp( N( 0, \log n / 2 )_{\bf R} )$

when ${M_n}$ is drawn from GUE, or from another Wigner ensemble matching GUE to fourth order (and obeying some additional minor technical hypotheses); and

$\displaystyle |\det M_n| \approx n^{-1/4} \sqrt{n!} \exp( N( 0, \log n )_{\bf R} )$

when ${M_n}$ is drawn from GOE, or from another Wigner ensemble matching GOE to fourth order. Again, these asymptotic limiting distributions are consistent with the asymptotic behaviour for the second moments.

The extension from the GUE or GOE case to more general Wigner ensembles is a fairly routine application of the four moment theorem for Wigner matrices, although for various technical reasons we do not quite use the existing four moment theorems in the literature, but adapt them to the log determinant. The main idea is to express the log-determinant as an integral

$\displaystyle \log|\det M_n| = \frac{1}{2} n \log n - n \hbox{Im} \int_0^\infty s(\sqrt{-1}\eta)\ d\eta \ \ \ \ \ (7)$

of the Stieltjes transform

$\displaystyle s(z) := \frac{1}{n} \hbox{tr}( \frac{1}{\sqrt{n}} M_n - z )^{-1}$

of ${M_n}$. Strictly speaking, the integral in (7) is divergent at infinity (and also can be ill-behaved near zero), but this can be addressed by standard truncation and renormalisation arguments (combined with known facts about the least singular value of Wigner matrices), which we omit here. We then use a variant of the four moment theorem for the Stieltjes transform, as used by Erdos, Yau, and Yin (based on a previous four moment theorem for individual eigenvalues introduced by Van Vu and myself). The four moment theorem is proven by the now-standard Lindeberg exchange method, combined with the usual resolvent identities to control the behaviour of the resolvent (and hence the Stieltjes transform) with respect to modifying one or two entries, together with the delocalisation of eigenvector property (which in turn arises from local semicircle laws) to control the error terms.

Somewhat surprisingly (to us, at least), it turned out that it was the first part of the theorem (namely, the verification of the limiting law for the invariant ensembles GUE and GOE) that was more difficult than the extension to the Wigner case. Even in an ensemble as highly symmetric as GUE, the rows are no longer independent, and the formula (2) is basically useless for getting any non-trivial control on the log determinant. There is an explicit formula for the joint distribution of the eigenvalues of GUE (or GOE), which does eventually give the distribution of the cumulants of the log determinant, which then gives the required central limit theorem; but this is a lengthy computation, first performed by Delannay and Le Caer.

Following a suggestion of my colleague, Rowan Killip, we give an alternate proof of this central limit theorem in the GUE and GOE cases, by using a beautiful observation of Trotter, namely that the GUE or GOE ensemble can be conjugated into a tractable tridiagonal form. Let me state it just for GUE:

Proposition 2 (Tridiagonal form of GUE) \cite{trotter} Let ${M'_n}$ be the random tridiagonal real symmetric matrix

$\displaystyle M'_n = \begin{pmatrix} a_1 & b_1 & 0 & \ldots & 0 & 0 \\ b_1 & a_2 & b_2 & \ldots & 0 & 0 \\ 0 & b_2 & a_3 & \ldots & 0 & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \ldots & a_{n-1} & b_{n-1} \\ 0 & 0 & 0 & \ldots & b_{n-1} & a_n \end{pmatrix}$

where the ${a_1,\ldots,a_n, b_1,\ldots,b_{n-1}}$ are jointly independent real random variables, with ${a_1,\ldots,a_n \equiv N(0,1)_{\bf R}}$ being standard real Gaussians, and each ${b_i}$ having a ${\chi}$-distribution:

$\displaystyle b_i = (\sum_{j=1}^i |z_{i,j}|^2)^{1/2}$

where ${z_{i,j} \equiv N(0,1)_{\bf C}}$ are iid complex gaussians. Let ${M_n}$ be drawn from GUE. Then the joint eigenvalue distribution of ${M_n}$ is identical to the joint eigenvalue distribution of ${M'_n}$.

Proof: Let ${M_n}$ be drawn from GUE. We can write

$\displaystyle M_n = \begin{pmatrix} M_{n-1} & X_n \\ X_n^* & a_n \end{pmatrix}$

where ${M_{n-1}}$ is drawn from the ${n-1\times n-1}$ GUE, ${a_n \equiv N(0,1)_{\bf R}}$, and ${X_n \in {\bf C}^{n-1}}$ is a random gaussian vector with all entries iid with distribution ${N(0,1)_{\bf C}}$. Furthermore, ${M_{n-1}, X_n, a_n}$ are jointly independent.

We now apply the tridiagonal matrix algorithm. Let ${b_{n-1} := |X_n|}$, then ${b_n}$ has the ${\chi}$-distribution indicated in the proposition. We then conjugate ${M_n}$ by a unitary matrix ${U}$ that preserves the final basis vector ${e_n}$, and maps ${X}$ to ${b_{n-1} e_{n-1}}$. Then we have

$\displaystyle U M_n U^* = \begin{pmatrix} \tilde M_{n-1} & b_{n-1} e_{n-1} \\ b_{n-1} e_{n-1}^* & a_n \end{pmatrix}$

where ${\tilde M_{n-1}}$ is conjugate to ${M_{n-1}}$. Now we make the crucial observation: because ${M_{n-1}}$ is distributed according to GUE (which is a unitarily invariant ensemble), and ${U}$ is a unitary matrix independent of ${M_{n-1}}$, ${\tilde M_{n-1}}$ is also distributed according to GUE, and remains independent of both ${b_{n-1}}$ and ${a_n}$.

We continue this process, expanding ${U M_n U^*}$ as

$\displaystyle \begin{pmatrix} M_{n-2} & X_{n-1} & 0 \\ X_{n-1}^* & a_{n-1} & b_{n-1} \\ 0 & b_{n-1} & a_n. \end{pmatrix}$

Applying a further unitary conjugation that fixes ${e_{n-1}, e_n}$ but maps ${X_{n-1}}$ to ${b_{n-2} e_{n-2}}$, we may replace ${X_{n-1}}$ by ${b_{n-2} e_{n-2}}$ while transforming ${M_{n-2}}$ to another GUE matrix ${\tilde M_{n-2}}$ independent of ${a_n, b_{n-1}, a_{n-1}, b_{n-2}}$. Iterating this process, we eventually obtain a coupling of ${M_n}$ to ${M'_n}$ by unitary conjugations, and the claim follows. $\Box$

The determinant of a tridiagonal matrix is not quite as simple as the determinant of a triangular matrix (in which it is simply the product of the diagonal entries), but it is pretty close: the determinant ${D_n}$ of the above matrix is given by solving the recursion

$\displaystyle D_i = a_i D_{i-1} + b_{i-1}^2 D_{i-2}$

with ${D_0=1}$ and ${D_{-1} = 0}$. Thus, instead of the product of a sequence of independent scalar ${\chi}$ distributions as in the gaussian matrix case, the determinant of GUE ends up being controlled by the product of a sequence of independent ${2\times 2}$ matrices whose entries are given by gaussians and ${\chi}$ distributions. In this case, one cannot immediately take logarithms and hope to get something for which the martingale central limit theorem can be applied, but some ad hoc manipulation of these ${2 \times 2}$ matrix products eventually does make this strategy work. (Roughly speaking, one has to work with the logarithm of the Frobenius norm of the matrix first.)