You are currently browsing the category archive for the ‘paper’ category.
Ben Green and I have just uploaded to the arXiv our paper “New bounds for Szemeredi’s theorem, Ia: Progressions of length 4 in finite field geometries revisited“, submitted to Proc. Lond. Math. Soc.. This is both an erratum to, and a replacement for, our previous paper “New bounds for Szemeredi’s theorem. I. Progressions of length 4 in finite field geometries“. The main objective in both papers is to bound the quantity for a vector space
over a finite field
of characteristic greater than
, where
is defined as the cardinality of the largest subset of
that does not contain an arithmetic progression of length
. In our earlier paper, we gave two arguments that bounded
in the regime when the field
was fixed and
was large. The first “cheap” argument gave the bound
and the more complicated “expensive” argument gave the improvement
depending only on
.
Unfortunately, while the cheap argument is correct, we discovered a subtle but serious gap in our expensive argument in the original paper. Roughly speaking, the strategy in that argument is to employ the density increment method: one begins with a large subset of
that has no arithmetic progressions of length
, and seeks to locate a subspace on which
has a significantly increased density. Then, by using a “Koopman-von Neumann theorem”, ultimately based on an iteration of the inverse
theorem of Ben and myself (and also independently by Samorodnitsky), one approximates
by a “quadratically structured” function
, which is (locally) a combination of a bounded number of quadratic phase functions, which one can prepare to be in a certain “locally equidistributed” or “locally high rank” form. (It is this reduction to the high rank case that distinguishes the “expensive” argument from the “cheap” one.) Because
has no progressions of length
, the count of progressions of length
weighted by
will also be small; by combining this with the theory of equidistribution of quadratic phase functions, one can then conclude that there will be a subspace on which
has increased density.
The error in the paper was to conclude from this that the original function also had increased density on the same subspace; it turns out that the manner in which
approximates
is not strong enough to deduce this latter conclusion from the former. (One can strengthen the nature of approximation until one restores such a conclusion, but only at the price of deteriorating the quantitative bounds on
one gets at the end of the day to be worse than the cheap argument.)
After trying unsuccessfully to repair this error, we eventually found an alternate argument, based on earlier papers of ourselves and of Bergelson-Host-Kra, that avoided the density increment method entirely and ended up giving a simpler proof of a stronger result than (1), and also gives the explicit value of for the exponent
in (1). In fact, it gives the following stronger result:
Theorem 1 Let
be a subset of
of density at least
, and let
. Then there is a subspace
of
of codimension
such that the number of (possibly degenerate) progressions
in
is at least
.
The bound (1) is an easy consequence of this theorem after choosing and removing the degenerate progressions from the conclusion of the theorem.
The main new idea is to work with a local Koopman-von Neumann theorem rather than a global one, trading a relatively weak global approximation to with a significantly stronger local approximation to
on a subspace
. This is somewhat analogous to how sometimes in graph theory it is more efficient (from the point of view of quantative estimates) to work with a local version of the Szemerédi regularity lemma which gives just a single regular pair of cells, rather than attempting to regularise almost all of the cells. This local approach is well adapted to the inverse
theorem we use (which also has this local aspect), and also makes the reduction to the high rank case much cleaner. At the end of the day, one ends up with a fairly large subspace
on which
is quite dense (of density
) and which can be well approximated by a “pure quadratic” object, namely a function of a small number of quadratic phases obeying a high rank condition. One can then exploit a special positivity property of the count of length four progressions weighted by pure quadratic objects, essentially due to Bergelson-Host-Kra, which then gives the required lower bound.
Just a quick update on my previous post on gamifying the problem-solving process in high school algebra. I had a little time to spare (on an airplane flight, of all things), so I decided to rework the mockup version of the algebra game into something a bit more structured, namely as 12 progressively difficult levels of solving a linear equation in one unknown. (Requires either Java or Flash.) Somewhat to my surprise, I found that one could create fairly challenging puzzles out of this simple algebra problem by carefully restricting the moves available at each level. Here is a screenshot of a typical level:

After completing each level, an icon appears which one can click on to proceed to the next level. (There is no particular rationale, by the way, behind my choice of icons; these are basically selected arbitrarily from the default collection of icons (or more precisely, “costumes”) available in Scratch.)
The restriction of moves made the puzzles significantly more artificial in nature, but I think that this may end up ultimately being a good thing, as to solve some of the harder puzzles one is forced to really start thinking about how the process of solving for an unknown actually works. (One could imagine that if one decided to make a fully fledged game out of this, one could have several modes of play, ranging from a puzzle mode in which one solves some carefully constructed, but artificial, puzzles, to a free-form mode in which one can solve arbitrary equations (including ones that you input yourself) using the full set of available algebraic moves.)
One advantage to gamifying linear algebra, as opposed to other types of algebra, is that there is no need for disjunction (i.e. splitting into cases). In contrast, if one has to solve a problem which involves at least one quadratic equation, then at some point one may be forced to divide the analysis into two disjoint cases, depending on which branch of a square root one is taking. I am not sure how to gamify this sort of branching in a civilised manner, and would be interested to hear of any suggestions in this regard. (A similar problem also arises in proving propositions in Euclidean geometry, which I had thought would be another good test case for gamification, because of the need to branch depending on the order of various points on a line, or rays through a point, or whether two lines are parallel or intersect.)
I recently finished the first draft of the the first of my books, entitled “Hilbert’s fifth problem and related topics“, based on the lecture notes for my graduate course of the same name. The PDF of this draft is available here. As always, comments and corrections are welcome.
I’ve just uploaded to the arXiv my paper The asymptotic distribution of a single eigenvalue gap of a Wigner matrix, submitted to Probability Theory and Related Fields. This paper (like several of my previous papers) is concerned with the asymptotic distribution of the eigenvalues of a random Wigner matrix
in the limit
, with a particular focus on matrices drawn from the Gaussian Unitary Ensemble (GUE). This paper is focused on the bulk of the spectrum, i.e. to eigenvalues
with
for some fixed
.
The location of an individual eigenvalue is by now quite well understood. If we normalise the entries of the matrix
to have mean zero and variance
, then in the asymptotic limit
, the Wigner semicircle law tells us that with probability
one has
where the classical location of the eigenvalue is given by the formula
and the semicircular distribution is given by the formula
Actually, one can improve the error term here from to
for any
(see this previous recent paper of Van and myself for more discussion of these sorts of estimates, sometimes known as eigenvalue rigidity estimates).
From the semicircle law (and the fundamental theorem of calculus), one expects the eigenvalue spacing
to have an average size of
. It is thus natural to introduce the normalised eigenvalue spacing
and ask what the distribution of is.
As mentioned previously, we will focus on the bulk case , and begin with the model case when
is drawn from GUE. (In the edge case when
is close to
or to
, the distribution is given by the famous Tracy-Widom law.) Here, the distribution was almost (but as we shall see, not quite) worked out by Gaudin and Mehta. By using the theory of determinantal processes, they were able to compute a quantity closely related to
, namely the probability
near
of length comparable to the expected eigenvalue spacing
is devoid of eigenvalues. For
in the bulk and fixed
, they showed that this probability is equal to
where is the Dyson projection
to Fourier modes in , and
is the Fredholm determinant. As shown by Jimbo, Miwa, Tetsuji, Mori, and Sato, this determinant can also be expressed in terms of a solution to a Painleve V ODE, though we will not need this fact here. In view of this asymptotic and some standard integration by parts manipulations, it becomes plausible to propose that
will be asymptotically distributed according to the Gaudin-Mehta distribution
, where
A reasonably accurate approximation for is given by the Wigner surmise
, which was presciently proposed by Wigner as early as 1957; it is exact for
but not in the asymptotic limit
.
Unfortunately, when one tries to make this argument rigorous, one finds that the asymptotic for (1) does not control a single gap , but rather an ensemble of gaps
, where
is drawn from an interval
of some moderate size
(e.g.
); see for instance this paper of Deift, Kriecherbauer, McLaughlin, Venakides, and Zhou for a more precise formalisation of this statement (which is phrased slightly differently, in which one samples all gaps inside a fixed window of spectrum, rather than inside a fixed range of eigenvalue indices
). (This result is stated for GUE, but can be extended to other Wigner ensembles by the Four Moment Theorem, at least if one assumes a moment matching condition; see this previous paper with Van Vu for details. The moment condition can in fact be removed, as was done in this subsequent paper with Erdos, Ramirez, Schlein, Vu, and Yau.)
The problem is that when one specifies a given window of spectrum such as , one cannot quite pin down in advance which eigenvalues
are going to lie to the left or right of this window; even with the strongest eigenvalue rigidity results available, there is a natural uncertainty of
or so in the
index (as can be quantified quite precisely by this central limit theorem of Gustavsson).
The main difficulty here is that there could potentially be some strange coupling between the event (1) of an interval being devoid of eigenvalues, and the number of eigenvalues to the left of that interval. For instance, one could conceive of a possible scenario in which the interval in (1) tends to have many eigenvalues when
is even, but very few when
is odd. In this sort of situation, the gaps
may have different behaviour for even
than for odd
, and such anomalies would not be picked up in the averaged statistics in which
is allowed to range over some moderately large interval.
The main result of the current paper is that these anomalies do not actually occur, and that all of the eigenvalue gaps in the bulk are asymptotically governed by the Gaudin-Mehta law without the need for averaging in the
parameter. Again, this is shown first for GUE, and then extended to other Wigner matrices obeying a matching moment condition using the Four Moment Theorem. (It is likely that the moment matching condition can be removed here, but I was unable to achieve this, despite all the recent advances in establishing universality of local spectral statistics for Wigner matrices, mainly because the universality results in the literature are more focused on specific energy levels
than on specific eigenvalue indices
. To make matters worse, in some cases universality is currently known only after an additional averaging in the energy parameter.)
The main task in the proof is to show that the random variable is largely decoupled from the event in (1) when
is drawn from GUE. To do this we use some of the theory of determinantal processes, and in particular the nice fact that when one conditions a determinantal process to the event that a certain spatial region (such as an interval) contains no points of the process, then one obtains a new determinantal process (with a kernel that is closely related to the original kernel). The main task is then to obtain a sufficiently good control on the distance between the new determinantal kernel and the old one, which we do by some functional-analytic considerations involving the manipulation of norms of operators (and specifically, the operator norm, Hilbert-Schmidt norm, and nuclear norm). Amusingly, the Fredholm alternative makes a key appearance, as I end up having to invert a compact perturbation of the identity at one point (specifically, I need to invert
, where
is the Dyson projection and
is an interval). As such, the bounds in my paper become ineffective, though I am sure that with more work one can invert this particular perturbation of the identity by hand, without the need to invoke the Fredholm alternative.
Van Vu and I have just uploaded to the arXiv our paper “Random matrices: The Universality phenomenon for Wigner ensembles“. This survey is a longer version (58 pages) of a previous short survey we wrote up a few months ago. The survey focuses on recent progress in understanding the universality phenomenon for Hermitian Wigner ensembles, of which the Gaussian Unitary Ensemble (GUE) is the most well known. The one-sentence summary of this progress is that many of the asymptotic spectral statistics (e.g. correlation functions, eigenvalue gaps, determinants, etc.) that were previously known for GUE matrices, are now known for very large classes of Wigner ensembles as well. There are however a wide variety of results of this type, due to the large number of interesting spectral statistics, the varying hypotheses placed on the ensemble, and the different modes of convergence studied, and it is difficult to isolate a single such result currently as the definitive universality result. (In particular, there is at present a tradeoff between generality of ensemble and strength of convergence; the universality results that are available for the most general classes of ensemble are only presently able to demonstrate a rather weak sense of convergence to the universal distribution (involving an additional averaging in the energy parameter), which limits the applicability of such results to a number of interesting questions in which energy averaging is not permissible, such as the study of the least singular value of a Wigner matrix, or of related quantities such as the condition number or determinant. But it is conceivable that this tradeoff is a temporary phenomenon and may be eliminated by future work in this area; in the case of Hermitian matrices whose entries have the same second moments as that of the GUE ensemble, for instance, the need for energy averaging has already been removed.)
Nevertheless, throughout the family of results that have been obtained recently, there are two main methods which have been fundamental to almost all of the recent progress in extending from special ensembles such as GUE to general ensembles. The first method, developed extensively by Erdos, Schlein, Yau, Yin, and others (and building on an initial breakthrough by Johansson), is the heat flow method, which exploits the rapid convergence to equilibrium of the spectral statistics of matrices undergoing Dyson-type flows towards GUE. (An important aspect to this method is the ability to accelerate the convergence to equilibrium by localising the Hamiltonian, in order to eliminate the slowest modes of the flow; this refinement of the method is known as the “local relaxation flow” method. Unfortunately, the translation mode is not accelerated by this process, which is the principal reason why results obtained by pure heat flow methods still require an energy averaging in the final conclusion; it would of interest to find a way around this difficulty.) The other method, which goes all the way back to Lindeberg in his classical proof of the central limit theorem, and which was introduced to random matrix theory by Chatterjee and then developed for the universality problem by Van Vu and myself, is the swapping method, which is based on the observation that spectral statistics of Wigner matrices tend to be stable if one replaces just one or two entries of the matrix with another distribution, with the stability of the swapping process becoming stronger if one assumes that the old and new entries have many matching moments. The main formalisations of this observation are known as four moment theorems, because they require four matching moments between the entries, although there are some variant three moment theorems and two moment theorems in the literature as well. Our initial four moment theorems were focused on individual eigenvalues (and later also to eigenvectors), but it was later observed by Erdos, Yau, and Yin that simpler four moment theorems could also be established for aggregate spectral statistics, such as the coefficients of the Greens function, and Knowles and Yin also subsequently observed that these latter theorems could be used to recover a four moment theorem for eigenvalues and eigenvectors, giving an alternate approach to proving such theorems.
Interestingly, it seems that the heat flow and swapping methods are complementary to each other; the heat flow methods are good at removing moment hypotheses on the coefficients, while the swapping methods are good at removing regularity hypotheses. To handle general ensembles with minimal moment or regularity hypotheses, it is thus necessary to combine the two methods (though perhaps in the future a third method, or a unification of the two existing methods, might emerge).
Besides the heat flow and swapping methods, there are also a number of other basic tools that are also needed in these results, such as local semicircle laws and eigenvalue rigidity, which are also discussed in the survey. We also survey how universality has been established for wide variety of spectral statistics; the -point correlation functions are the most well known of these statistics, but they do not tell the whole story (particularly if one can only control these functions after an averaging in the energy), and there are a number of other statistics, such as eigenvalue counting functions, determinants, or spectral gaps, for which the above methods can be applied.
In order to prevent the survey from becoming too enormous, we decided to restrict attention to Hermitian matrix ensembles, whose entries off the diagonal are identically distributed, as this is the case in which the strongest results are available. There are several results that are applicable to more general ensembles than these which are briefly mentioned in the survey, but they are not covered in detail.
We plan to submit this survey eventually to the proceedings of a workshop on random matrix theory, and will continue to update the references on the arXiv version until the time comes to actually submit the paper.
Finally, in the survey we issue some errata for previous papers of Van and myself in this area, mostly centering around the three moment theorem (a variant of the more widely used four moment theorem), for which the original proof of Van and myself was incomplete. (Fortunately, as the three moment theorem had many fewer applications than the four moment theorem, and most of the applications that it did have ended up being superseded by subsequent papers, the actual impact of this issue was limited, but still an erratum is in order.)
I’ve just uploaded to the arXiv my paper “Every odd number greater than 1 is the sum of at most five primes“, submitted to Mathematics of Computation. The main result of the paper is as stated in the title, and is in the spirit of (though significantly weaker than) the even Goldbach conjecture (every even natural number is the sum of at most two primes) and odd Goldbach conjecture (every odd natural number greater than 1 is the sum of at most three primes). It also improves on a result of Ramaré that every even natural number is the sum of at most six primes. This result had previously also been established by Kaniecki under the additional assumption of the Riemann hypothesis, so one can view the main result here as an unconditional version of Kaniecki’s result.
The method used is the Hardy-Littlewood circle method, which was for instance also used to prove Vinogradov’s theorem that every sufficiently large odd number is the sum of three primes. Let’s quickly recall how this argument works. It is convenient to use a proxy for the primes, such as the von Mangoldt function , which is mostly supported on the primes. To represent a large number
as the sum of three primes, it suffices to obtain a good lower bound for the sum
By Fourier analysis, one can rewrite this sum as an integral
where
and . To control this integral, one then needs good bounds on
for various values of
. To do this, one first approximates
by a rational
with controlled denominator (using a tool such as the Dirichlet approximation theorem)
. The analysis then broadly bifurcates into the major arc case when
is small, and the minor arc case when
is large. In the major arc case, the problem more or less boils down to understanding sums such as
which in turn is almost equivalent to understanding the prime number theorem in arithmetic progressions modulo . In the minor arc case, the prime number theorem is not strong enough to give good bounds (unless one is using some extremely strong hypotheses, such as the generalised Riemann hypothesis), so instead one uses a rather different method, using truncated versions of divisor sum identities such as
to split
into a collection of linear and bilinear sums that are more tractable to bound, typical examples of which (after using a particularly simple truncated divisor sum identity known as Vaughan’s identity) include the “Type I sum”
and the “Type II sum”
After using tools such as the triangle inequality or Cauchy-Schwarz inequality to eliminate arithmetic functions such as or
, one ends up controlling plain exponential sums such as
, which can be efficiently controlled in the minor arc case.
This argument works well when is extremely large, but starts running into problems for moderate sized
, e.g.
. The first issue is that of logarithmic losses in the minor arc estimates. A typical minor arc estimate takes the shape
is close to
for some
. This only improves upon the trivial estimate
from the prime number theorem when
. As a consequence, it becomes necessary to obtain an accurate prime number theorem in arithmetic progressions with modulus as large as
. However, with current technology, the error term in such theorems are quite poor (terms such as
for some small
are typical, and there is also a notorious “Siegel zero” problem), and as a consequence, the method is generally only applicable for very large
. For instance, the best explicit result of Vinogradov type known currently is due to Liu and Wang, who established that all odd numbers larger than
are the sum of three odd primes. (However, on the assumption of the GRH, the full odd Goldbach conjecture is known to be true; this is a result of Deshouillers, Effinger, te Riele, and Zinoviev.)
In this paper, we make a number of refinements to the general scheme, each one of which is individually rather modest and not all that novel, but which when added together turn out to be enough to resolve the five primes problem (though many more ideas would still be needed to tackle the three primes problem, and as is well known the circle method is very unlikely to be the route to make progress on the two primes problem). The first refinement, which is only available in the five primes case, is to take advantage of the numerical verification of the even Goldbach conjecture up to some large (we take
, using a verification of Richstein, although there are now much larger values of
– as high as
– for which the conjecture has been verified). As such, instead of trying to represent an odd number
as the sum of five primes, we can represent it as the sum of three odd primes and a natural number between
and
. This effectively brings us back to the three primes problem, but with the significant additional boost that one can essentially restrict the frequency variable
to be of size
. In practice, this eliminates all of the major arcs except for the principal arc around
. This is a significant simplification, in particular avoiding the need to deal with the prime number theorem in arithmetic progressions (and all the attendant theory of L-functions, Siegel zeroes, etc.).
In a similar spirit, by taking advantage of the numerical verification of the Riemann hypothesis up to some height , and using the explicit formula relating the von Mangoldt function with the zeroes of the zeta function, one can safely deal with the principal major arc
. For our specific application, we use the value
, arising from the verification of the Riemann hypothesis of the first
zeroes by van de Lune (unpublished) and Wedeniswki. (Such verifications have since been extended further, the latest being that the first
zeroes lie on the line.)
To make the contribution of the major arc as efficient as possible, we borrow an idea from a paper of Bourgain, and restrict one of the three primes in the three-primes problem to a somewhat shorter range than the other two (of size instead of
, where we take
to be something like
), as this largely eliminates the “Archimedean” losses coming from trying to use Fourier methods to control convolutions on
. In our paper, we set the scale parameter
to be
(basically, anything that is much larger than
but much less than
will work), but we found that an additional gain (which we ended up not using) could be obtained by averaging
over a range of scales, say between
and
. This sort of averaging could be a useful trick in future work on Goldbach-type problems.
It remains to treat the contribution of the “minor arc” . To do this, one needs good
and
type estimates on the exponential sum
. Plancherel’s theorem gives an
estimate which loses a logarithmic factor, but it turns out that on this particular minor arc one can use tools from the theory of the large sieve (such as Montgomery’s uncertainty principle) to eliminate this logarithmic loss almost completely; it turns out that the most efficient way to do this is use an effective upper bound of Siebert on the number of prime pairs
less than
to obtain an
bound that only loses a factor of
(or of
, once one cuts out the major arc).
For estimates, it turns out that existing effective versions of (1) (in particular, the bound given by Chen and Wang) are insufficient, due to the three logarithmic factors of
in the bound. By using a smoothed out version
of the sum
, for some suitable cutoff function
, one can save one factor of a logarithm, obtaining a bound of the form
with effective constants. One can improve the constants further by restricting all summations to odd integers (which barely affects , since
was mostly supported on odd numbers anyway), which in practice reduces the effective constants by a factor of two or so. One can also make further improvements in the constants by using the very sharp large sieve inequality to control the “Type II” sums that arise from Vaughan’s identity, and by using integration by parts to improve the bounds on the “Type I” sums. A final gain can then be extracted by optimising the cutoff parameters
appearing in Vaughan’s identity to minimise the contribution of the Type II sums (which, in practice, are the dominant term). Combining all these improvements, one ends up with bounds of the shape
when is small (say
) and
when is large (say
). (See the paper for more explicit versions of these estimates.) The point here is that the
factors have been partially replaced by smaller logarithmic factors such as
or
. Putting together all of these improvements, one can finally obtain a satisfactory bound on the minor arc. (There are still some terms with a
factor in them, but we use the effective Vinogradov theorem of Liu and Wang to upper bound
by
, which ends up making the remaining terms involving
manageable.)
Van Vu and I have just uploaded to the arXiv our paper Random matrices: Sharp concentration of eigenvalues, submitted to the Electronic Journal of Probability. As with many of our previous papers, this paper is concerned with the distribution of the eigenvalues of a random Wigner matrix
(such as a matrix drawn from the Gaussian Unitary Ensemble (GUE) or Gaussian Orthogonal Ensemble (GOE)). To simplify the discussion we shall mostly restrict attention to the bulk of the spectrum, i.e. to eigenvalues
with
for some fixed
, although analogues of most of the results below have also been obtained at the edge of the spectrum.
If we normalise the entries of the matrix to have mean zero and variance
, then in the asymptotic limit
, we have the Wigner semicircle law, which asserts that the eigenvalues are asymptotically distributed according to the semicircular distribution
, where
An essentially equivalent way of saying this is that for large , we expect the
eigenvalue
of
to stay close to the classical location
, defined by the formula
In particular, from the Wigner semicircle law it can be shown that asymptotically almost surely, one has
.
In the modern study of the spectrum of Wigner matrices (and in particular as a key tool in establishing universality results), it has become of interest to improve the error term in (1) as much as possible. A typical early result in this direction was by Bai, who used the Stieltjes transform method to obtain polynomial convergence rates of the shape for some absolute constant
; see also the subsequent papers of Alon-Krivelevich-Vu and of of Meckes, who were able to obtain such convergence rates (with exponentially high probability) by using concentration of measure tools, such as Talagrand’s inequality. On the other hand, in the case of the GUE ensemble it is known (by this paper of Gustavsson) that
has variance comparable to
in the bulk, so that the optimal error term in (1) should be about
. (One may think that if one wanted bounds on (1) that were uniform in
, one would need to enlarge the error term further, but this does not appear to be the case, due to strong correlations between the
; note for instance this recent result of Ben Arous and Bourgarde that the largest gap between eigenvalues in the bulk is typically of order
.)
A significant advance in this direction was achieved by Erdos, Schlein, and Yau in a series of papers where they used a combination of Stieltjes transform and concentration of measure methods to obtain local semicircle laws which showed, among other things, that one had asymptotics of the form
with exponentially high probability for intervals in the bulk that were as short as
for some
, where
is the number of eigenvalues. These asymptotics are consistent with a good error term in (1), and are already sufficient for many applications, but do not quite imply a strong concentration result for individual eigenvalues
(basically because they do not preclude long-range or “secular” shifts in the spectrum that involve large blocks of eigenvalues at mesoscopic scales). Nevertheless, this was rectified in a subsequent paper of Erdos, Yau, and Yin, which roughly speaking obtained a bound of the form
in the bulk with exponentially high probability, for Wigner matrices obeying some exponential decay conditions on the entries. This was achieved by a rather delicate high moment calculation, in which the contribution of the diagonal entries of the resolvent (whose average forms the Stieltjes transform) was shown to mostly cancel each other out.
As the GUE computations show, this concentration result is sharp up to the quasilogarithmic factor . The main result of this paper is to improve the concentration result to one more in line with the GUE case, namely
with exponentially high probability (see the paper for a more precise statement of results). The one catch is that an additional hypothesis is required, namely that the entries of the Wigner matrix have vanishing third moment. We also obtain similar results for the edge of the spectrum (but with a different scaling).
Our arguments are rather different from those of Erdos, Yau, and Yin, and thus provide an alternate approach to establishing eigenvalue concentration. The main tool is the Lindeberg exchange strategy, which is also used to prove the Four Moment Theorem (although we do not directly invoke the Four Moment Theorem in our analysis). The main novelty is that this exchange strategy is now used to establish large deviation estimates (i.e. exponentially small tail probabilities) rather than universality of the limiting distribution. Roughly speaking, the basic point is as follows. The Lindeberg exchange strategy seeks to compare a function of many independent random variables
with the same function
of a different set of random variables (which match moments with the original set of variables to some order, such as to second or fourth order) by exchanging the random variables one at a time. Typically, one tries to upper bound expressions such as
for various smooth test functions , by performing a Taylor expansion in the variable being swapped and taking advantage of the matching moment hypotheses. In previous implementations of this strategy,
was a bounded test function, which allowed one to get control of the bulk of the distribution of
, and in particular in controlling probabilities such as
for various thresholds and
, but did not give good control on the tail as the error terms tended to be polynomially decaying in
rather than exponentially decaying. However, it turns out that one can modify the exchange strategy to deal with moments such as
for various moderately large (e.g. of size comparable to
), obtaining results such as
after performing all the relevant exchanges. As such, one can then use large deviation estimates on to deduce large deviation estimates on
.
In this paper we also take advantage of a simplification, first noted by Erdos, Yau, and Yin, that Four Moment Theorems become somewhat easier to prove if one works with resolvents (and the closely related Stieltjes transform
) rather than with individual eigenvalues, as the Taylor expansion of resolvents are very simple (essentially being a Neumann series). The relationship between the Stieltjes transform and the location of individual eigenvalues can be seen by taking advantage of the identity
for any energy level , which can be verified from elementary calculus. (In practice, we would truncate
near zero and near infinity to avoid some divergences, but this is a minor technicality.) As such, a concentration result for the Stieltjes transform can be used to establish an analogous concentration result for the eigenvalue counting functions
, which in turn can be used to deduce concentration results for individual eigenvalues
by some basic combinatorial manipulations.
Emmanuel Breuillard, Ben Green and I have just uploaded to the arXiv the short paper “A nilpotent Freiman dimension lemma“, submitted to the special volume of the European Journal of Combinatorics in honour of Yahya Ould Hamidoune. This paper is a nonabelian (or more precisely, nilpotent) variant of the following additive combinatorics lemma of Freiman:
Freiman’s lemma. Let A be a finite subset of a Euclidean space with
. Then A is contained in an affine subspace of dimension at most
.
This can be viewed as a “cheap” version of the more well known theorem of Freiman that places sets of small doubling in a torsion-free abelian group inside a generalised arithmetic progression. The advantage here is that the bound on the dimension is extremely explicit.
Our main result is
Theorem. Let A be a finite subset of a simply-connected nilpotent Lie group G which is a K-approximate group (i.e. A is symmetric, contains the identity, and
can be covered by up to K left translates of A. Then A can be covered by at most
left-translates of a closed connected Lie subgroup of dimension at most
.
We remark that our previous paper established a similar result, in which the dimension bound was improved to , but at the cost of worsening the covering number to
, and with a much more complicated proof (91 pages instead of 8). Furthermore, the bound on
is ineffective, due to the use of ultraproducts in the argument (though it is likely that some extremely lousy explicit bound could eventually be squeezed out of the argument by finitising everything). Note that the step of the ambient nilpotent group G does not influence the final bounds in the theorem, although we do of course need this step to be finite. A simple quotienting argument allows one to deduce a corollary of the above theorem in which the ambient group is assumed to be residually torsion-free nilpotent instead of being a simply connected nilpotent Lie group, but we omit the statement of this corollary here.
To motivate the proof of this theorem, let us first show a simple case of an argument of Gleason, which is very much in the spirit of Freiman’s lemma:
Gleason Lemma (special case). Let
be a finite symmetric subset of a Euclidean space, and let
be a sequence of subspaces in this space, such that the sets
are strictly increasing in i for
. Then
, where
.
Proof. By hypothesis, for each , the projection
of
to
is non-trivial, finite, and symmetric. In particular, since the vector space
is torsion-free,
is strictly larger than
. Equivalently, one can find
in
that does not lie in
; in particular,
and
is disjoint from
. As a consequence, the
are disjoint and lie in 5A, whence the claim.
Note that by combining the contrapositive of this lemma with a greedy algorithm, one can show that any K-approximate group in a Euclidean space is contained in a subspace of dimension at most , which is a weak version of Freiman’s lemma.
To extend the argument to the nilpotent setting we use the following idea. Observe that any non-trivial genuine subgroup H of a nilpotent group G will contain at least one non-trivial central element; indeed, by intersecting H with the lower central series of G, and considering the last intersection
which is non-trivial, one obtains the claim. It turns out that one can adapt this argument to approximate groups, so that any sufficiently large K-approximate subgroup A of G will contain a non-trivial element that centralises a large fraction of A. Passing to this large fraction and quotienting out the central element, we obtain a new approximate group. If, after a bounded number of steps, this procedure gives an approximate group of bounded size, we are basically done. If, however, the process continues, then by using some Lie group theory, one can find a long sequence
of connected Lie subgroups of G, such that the sets
are strictly increasing in i. Using some Lie group theory and the hypotheses on G, one can deduce that the group
generated by
is much larger than
, in the sense that the latter group has infinite index in the former. It then turns out that the Gleason argument mentioned above can be adapted to this setting.
Van Vu and I have just uploaded to the arXiv our short survey article, “Random matrices: The Four Moment Theorem for Wigner ensembles“, submitted to the MSRI book series, as part of the proceedings on the MSRI semester program on random matrix theory from last year. This is a highly condensed version (at 17 pages) of a much longer survey (currently at about 48 pages, though not completely finished) that we are currently working on, devoted to the recent advances in understanding the universality phenomenon for spectral statistics of Wigner matrices. In this abridged version of the survey, we focus on a key tool in the subject, namely the Four Moment Theorem which roughly speaking asserts that the statistics of a Wigner matrix depend only on the first four moments of the entries. We give a sketch of proof of this theorem, and two sample applications: a central limit theorem for individual eigenvalues of a Wigner matrix (extending a result of Gustavsson in the case of GUE), and the verification of a conjecture of Wigner, Dyson, and Mehta on the universality of the asymptotic k-point correlation functions even for discrete ensembles (provided that we interpret convergence in the vague topology sense).
For reasons of space, this paper is very far from an exhaustive survey even of the narrow topic of universality for Wigner matrices, but should hopefully be an accessible entry point into the subject nevertheless.
Van Vu and I have just uploaded to the arXiv our paper A central limit theorem for the determinant of a Wigner matrix, submitted to Adv. Math.. It studies the asymptotic distribution of the determinant of a random Wigner matrix (such as a matrix drawn from the Gaussian Unitary Ensemble (GUE) or Gaussian Orthogonal Ensemble (GOE)).
Before we get to these results, let us first discuss the simpler problem of studying the determinant of a random iid matrix
, such as a real gaussian matrix (where all entries are independently and identically distributed using the standard real normal distribution
), a complex gaussian matrix (where all entries are independently and identically distributed using the standard complex normal distribution
, thus the real and imaginary parts are independent with law
), or the random sign matrix (in which all entries are independently and identically distributed according to the Bernoulli distribution
(with a
chance of either sign). More generally, one can consider a matrix
in which all the entries
are independently and identically distributed with mean zero and variance
.
We can expand using the Leibniz expansion
ranges over the permutations of
, and
is the product
From the iid nature of the , we easily see that each
has mean zero and variance one, and are pairwise uncorrelated as
varies. We conclude that
has mean zero and variance
(an observation first made by Turán). In particular, from Chebyshev’s inequality we see that
is typically of size
.
It turns out, though, that this is not quite best possible. This is easiest to explain in the real gaussian case, by performing a computation first made by Goodman. In this case, is clearly symmetrical, so we can focus attention on the magnitude
. We can interpret this quantity geometrically as the volume of an
-dimensional parallelopiped whose generating vectors
are independent real gaussian vectors in
(i.e. their coefficients are iid with law
). Using the classical base-times-height formula, we thus have
is the
-dimensional linear subspace of
spanned by
(note that
, having an absolutely continuous joint distribution, are almost surely linearly independent). Taking logarithms, we conclude
Now, we take advantage of a fundamental symmetry property of the Gaussian vector distribution, namely its invariance with respect to the orthogonal group . Because of this, we see that if we fix
(and thus
, the random variable
has the same distribution as
, or equivalently the
distribution
where are iid copies of
. As this distribution does not depend on the
, we conclude that the law of
is given by the sum of
independent
-variables:
A standard computation shows that each has mean
and variance
, and then a Taylor series (or Ito calculus) computation (using concentration of measure tools to control tails) shows that
has mean
and variance
. As such,
has mean
and variance
. Applying a suitable version of the central limit theorem, one obtains the asymptotic law
denotes convergence in distribution. A bit more informally, we have
is a real gaussian matrix; thus, for instance, the median value of
is
. At first glance, this appears to conflict with the second moment bound
of Turán mentioned earlier, but once one recalls that
has a second moment of
, we see that the two facts are in fact perfectly consistent; the upper tail of the normal distribution in the exponent in (4) ends up dominating the second moment.
It turns out that the central limit theorem (3) is valid for any real iid matrix with mean zero, variance one, and an exponential decay condition on the entries; this was first claimed by Girko, though the arguments in that paper appear to be incomplete. Another proof of this result, with more quantitative bounds on the convergence rate has been recently obtained by Hoi Nguyen and Van Vu. The basic idea in these arguments is to express the sum in (2) in terms of a martingale and apply the martingale central limit theorem.
If one works with complex gaussian random matrices instead of real gaussian random matrices, the above computations change slightly (one has to replace the real distribution with the complex
distribution, in which the
are distributed according to the complex gaussian
instead of the real one). At the end of the day, one ends up with the law
We can now turn to the results of our paper. Here, we replace the iid matrices by Wigner matrices
, which are defined similarly but are constrained to be Hermitian (or real symmetric), thus
for all
. Model examples here include the Gaussian Unitary Ensemble (GUE), in which
for
and
for
, the Gaussian Orthogonal Ensemble (GOE), in which
for
and
for
, and the symmetric Bernoulli ensemble, in which
for
(with probability
of either sign). In all cases, the upper triangular entries of the matrix are assumed to be jointly independent. For a more precise definition of the Wigner matrix ensembles we are considering, see the introduction to our paper.
The determinants of these matrices still have a Leibniz expansion. However, in the Wigner case, the mean and variance of the
are slightly different, and what is worse, they are not all pairwise uncorrelated any more. For instance, the mean of
is still usually zero, but equals
in the exceptional case when
is a perfect matching (i.e. the union of exactly
-cycles, a possibility that can of course only happen when
is even). As such, the mean
still vanishes when
is odd, but for even
it is equal to
(the fraction here simply being the number of perfect matchings on vertices). Using Stirling’s formula, one then computes that
is comparable to
when
is large and even. The second moment calculation is more complicated (and uses facts about the distribution of cycles in random permutations, mentioned in this previous post), but one can compute that
is comparable to
for GUE and
for GOE. (The discrepancy here comes from the fact that in the GOE case,
and
can correlate when
contains reversals of
-cycles of
for
, but this does not happen in the GUE case.) For GUE, much more precise asymptotics for the moments of the determinant are known, starting from the work of Brezin and Hikami, though we do not need these more sophisticated computations here.
Our main results are then as follows.
Theorem 1 Let
be a Wigner matrix.
- If
is drawn from GUE, then
- If
is drawn from GOE, then
- The previous two results also hold for more general Wigner matrices, assuming that the real and imaginary parts are independent, a finite moment condition is satisfied, and the entries match moments with those of GOE or GUE to fourth order. (See the paper for a more precise formulation of the result.)
Thus, we informally have
when is drawn from GUE, or from another Wigner ensemble matching GUE to fourth order (and obeying some additional minor technical hypotheses); and
when is drawn from GOE, or from another Wigner ensemble matching GOE to fourth order. Again, these asymptotic limiting distributions are consistent with the asymptotic behaviour for the second moments.
The extension from the GUE or GOE case to more general Wigner ensembles is a fairly routine application of the four moment theorem for Wigner matrices, although for various technical reasons we do not quite use the existing four moment theorems in the literature, but adapt them to the log determinant. The main idea is to express the log-determinant as an integral
of . Strictly speaking, the integral in (7) is divergent at infinity (and also can be ill-behaved near zero), but this can be addressed by standard truncation and renormalisation arguments (combined with known facts about the least singular value of Wigner matrices), which we omit here. We then use a variant of the four moment theorem for the Stieltjes transform, as used by Erdos, Yau, and Yin (based on a previous four moment theorem for individual eigenvalues introduced by Van Vu and myself). The four moment theorem is proven by the now-standard Lindeberg exchange method, combined with the usual resolvent identities to control the behaviour of the resolvent (and hence the Stieltjes transform) with respect to modifying one or two entries, together with the delocalisation of eigenvector property (which in turn arises from local semicircle laws) to control the error terms.
Somewhat surprisingly (to us, at least), it turned out that it was the first part of the theorem (namely, the verification of the limiting law for the invariant ensembles GUE and GOE) that was more difficult than the extension to the Wigner case. Even in an ensemble as highly symmetric as GUE, the rows are no longer independent, and the formula (2) is basically useless for getting any non-trivial control on the log determinant. There is an explicit formula for the joint distribution of the eigenvalues of GUE (or GOE), which does eventually give the distribution of the cumulants of the log determinant, which then gives the required central limit theorem; but this is a lengthy computation, first performed by Delannay and Le Caer.
Following a suggestion of my colleague, Rowan Killip, we give an alternate proof of this central limit theorem in the GUE and GOE cases, by using a beautiful observation of Trotter, namely that the GUE or GOE ensemble can be conjugated into a tractable tridiagonal form. Let me state it just for GUE:
Proposition 2 (Tridiagonal form of GUE) \cite{trotter} Let
be the random tridiagonal real symmetric matrix
where the
are jointly independent real random variables, with
being standard real Gaussians, and each
having a
-distribution:
where
are iid complex gaussians. Let
be drawn from GUE. Then the joint eigenvalue distribution of
is identical to the joint eigenvalue distribution of
.
Proof: Let be drawn from GUE. We can write
where is drawn from the
GUE,
, and
is a random gaussian vector with all entries iid with distribution
. Furthermore,
are jointly independent.
We now apply the tridiagonal matrix algorithm. Let , then
has the
-distribution indicated in the proposition. We then conjugate
by a unitary matrix
that preserves the final basis vector
, and maps
to
. Then we have
where is conjugate to
. Now we make the crucial observation: because
is distributed according to GUE (which is a unitarily invariant ensemble), and
is a unitary matrix independent of
,
is also distributed according to GUE, and remains independent of both
and
.
We continue this process, expanding as
Applying a further unitary conjugation that fixes but maps
to
, we may replace
by
while transforming
to another GUE matrix
independent of
. Iterating this process, we eventually obtain a coupling of
to
by unitary conjugations, and the claim follows.
The determinant of a tridiagonal matrix is not quite as simple as the determinant of a triangular matrix (in which it is simply the product of the diagonal entries), but it is pretty close: the determinant of the above matrix is given by solving the recursion
with and
. Thus, instead of the product of a sequence of independent scalar
distributions as in the gaussian matrix case, the determinant of GUE ends up being controlled by the product of a sequence of independent
matrices whose entries are given by gaussians and
distributions. In this case, one cannot immediately take logarithms and hope to get something for which the martingale central limit theorem can be applied, but some ad hoc manipulation of these
matrix products eventually does make this strategy work. (Roughly speaking, one has to work with the logarithm of the Frobenius norm of the matrix first.)

Recent Comments