You are currently browsing the monthly archive for October 2008.

I was asked recently (in relation to my recent work with Van Vu on the spectral theory of random matrices) to explain some standard heuristics regarding how the eigenvalues $\lambda_1,\ldots,\lambda_n$ of an $n \times n$ matrix A behave under small perturbations.  These heuristics can be summarised as follows:

1. For normal matrices (and in particular, unitary or self-adjoint matrices), eigenvalues are very stable under small perturbations.  For more general matrices, eigenvalues can become unstable if there is pseudospectrum present.
2. For self-adjoint (Hermitian) matrices, eigenvalues that are too close together tend to repel quickly from each other under such perturbations.  For more general matrices, eigenvalues can either be repelled or be attracted to each other, depending on the type of perturbation.

In this post I would like to briefly explain why these heuristics are plausible.

In the third of the Distinguished Lecture Series given by Eli Stein here at UCLA, Eli presented a slightly different topic, which is work in preparation with Alex Nagel, Fulvio Ricci, and Steve Wainger, on algebras of singular integral operators which are sensitive to multiple different geometries in a nilpotent Lie group.

In the second of the Distinguished Lecture Series given by Eli Stein here at UCLA, Eli expanded on the themes in the first lecture, in particular providing more details as to the recent (not yet published) results of Lanzani and Stein on the boundedness of the Cauchy integral on domains in several complex variables.

The first Distinguished Lecture Series at UCLA for this academic year is given by Elias Stein (who, incidentally, was my graduate student advisor), who is lecturing on “Singular Integrals and Several Complex Variables: Some New Perspectives“.  The first lecture was a historical (and non-technical) survey of modern harmonic analysis (which, amazingly, was compressed into half an hour), followed by an introduction as to how this theory is currently in the process of being adapted to handle the basic analytical issues in several complex variables, a topic which in many ways is still only now being developed.  The second and third lectures will focus on these issues in greater depth.

As usual, any errors here are due to my transcription and interpretation of the lecture.

[Update, Oct 27: The slides from the talk are now available here.]

The level and quality of discourse in this U.S. presidential campaign has not been particularly high, especially in recent weeks.  So I found former Gen. Powell’s recent analysis of the current state of affairs, as part of his widely publicised endorsement of Sen. Obama, to be a welcome and refreshing improvement in this regard:

It’s a shame that much of the rhetoric and commentary surrounding this campaign – from all sides – was not more like this.  [In keeping with this, I would like to remind commenters to keep the discussion constructive, polite, and on-topic.]

[Update, Oct 22: Unfortunately, some of the more recent comments have not been as constructive, polite, and on-topic as I would have hoped.  I am therefore closing this post to further comments, though anyone who wishes to discuss these issues on their own blog is welcome to leave a pingback to this post here.]

Van Vu and I have just uploaded to the arXiv our survey paper “From the Littlewood-Offord problem to the Circular Law: universality of the spectral distribution of random matrices“, submitted to Bull. Amer. Math. Soc..  This survey recaps (avoiding most of the technical details) the recent work of ourselves and others that exploits the inverse theory for the Littlewood-Offord problem (which, roughly speaking, amounts to figuring out what types of random walks exhibit concentration at any given point), and how this leads to bounds on condition numbers, least singular values, and resolvents of random matrices; and then how the latter then leads to universality of the empirical spectral distributions (ESDs) of random matrices, and in particular to the circular law for the ESDs for iid random matrices with zero mean and unit variance (see my previous blog post on this topic, or my Lewis lectures).  We conclude by mentioning a few open problems in the subject.

While this subject does unfortunately contain a large amount of technical theory and detail, every so often we find a very elementary observation that simplifies the work required significantly.  One such observation is an identity which we call the negative second moment identity, which I would like to discuss here.    Let A be an $n \times n$ matrix; for simplicity we assume that the entries are real-valued.  Denote the n rows of A by $X_1,\ldots,X_n$, which we view as vectors in ${\Bbb R}^n$.  Let $\sigma_1(A) \geq \ldots \geq \sigma_n(A) \geq 0$ be the singular values of A. In our applications, the vectors $X_j$ are easily described (e.g. they might be randomly distributed on the discrete cube $\{-1,1\}^n$), but the distribution of the singular values $\sigma_j(A)$ is much more mysterious, and understanding this distribution is a key objective in this entire theory.

Last year on this blog, I sketched out a non-rigorous probabilistic argument justifying the following well-known theorem:

Theorem 1. (Non-measurable sets exist) There exists a subset $E$ of the unit interval ${}[0,1]$ which is not  Lebesgue-measurable.

The idea was to let E be a “random” subset of ${}[0,1]$.  If one (non-rigorously) applies the law of large numbers, one expects E to have “density” 1/2 with respect to every subinterval of ${}[0,1]$, which would contradict the Lebesgue differentiation theorem.

I was recently asked whether I could in fact make the above argument rigorous. This turned out to be more difficult than I had anticipated, due to some technicalities in trying to make the concept of a random subset of ${}[0,1]$ (which requires an uncountable number of “coin flips” to generate) both rigorous and useful.  However, there is a simpler variant of the above argument which can be made rigorous.  Instead of letting E be a “random” subset of ${}[0,1]$, one takes E to be an “alternating” set that contains “every other” real number in ${}[0,1]$; this again should have density 1/2 in every subinterval and thus again contradict the Lebesgue differentiation theorem.

Of course, in the standard model of the real numbers, it makes no sense to talk about “every other” or “every second” real number, as the real numbers are not discrete.  If however one employs the language of non-standard analysis, then it is possible to make the above argument rigorous, and this is the purpose of my post today. I will assume some basic familiarity with non-standard analysis, for instance as discussed in this earlier post of mine.

The U.S. presidential election is now only a few weeks away.  The politics of this election are of course interesting and important, but I do not want to discuss these topics here (there is not exactly a shortage of other venues for such a discussion), and would request that readers refrain from doing so in the comments to this post.  However, I thought it would be apropos to talk about some of the basic mathematics underlying electoral polling, and specifically to explain the fact, which can be highly unintuitive to those not well versed in statistics, that polls can be accurate even when sampling only a tiny fraction of the entire population.

Take for instance a nationwide poll of U.S. voters on which presidential candidate they intend to vote for.  A typical poll will ask a number $n$ of randomly selected voters for their opinion; a typical value here is $n = 1000$.  In contrast, the total voting-eligible population of the U.S. – let’s call this set $X$ – is about 200 million.  (The actual turnout in the election is likely to be closer to 100 million, but let’s ignore this fact for the sake of discussion.)  Thus, such a poll would sample about 0.0005% of the total population $X$ – an incredibly tiny fraction.  Nevertheless, the margin of error (at the 95% confidence level) for such a poll, if conducted under idealised conditions (see below), is about 3%.  In other words, if we let $p$ denote the proportion of the entire population $X$ that will vote for a given candidate $A$, and let $\overline{p}$ denote the proportion of the polled voters that will vote for $A$, then the event $\overline{p}-0.03 \leq p \leq \overline{p}+0.03$ will occur with probability at least 0.95.  Thus, for instance (and oversimplifying a little – see below), if the poll reports that 55% of respondents would vote for A, then the true percentage of the electorate that would vote for A has at least a 95% chance of lying between 52% and 58%.  Larger polls will of course give a smaller margin of error; for instance the margin of error for an (idealised) poll of 2,000 voters is about 2%.

I’ll give a rigorous proof of a weaker version of the above statement (giving a margin of error of about 7%, rather than 3%) in an appendix at the end of this post.  But the main point of my post here is a little different, namely to address the common misconception that the accuracy of a poll is a function of the relative sample size rather than the absolute sample size, which would suggest that a poll involving only 0.0005% of the population could not possibly have a margin of error as low as 3%.  I also want to point out some limitations of the mathematical analysis; depending on the methodology and the context, some polls involving 1000 respondents may have a much higher margin of error than the idealised rate of 3%.

Additive combinatorics is largely focused on the additive properties of finite subsets A of an additive group $G = (G,+)$.  This group can be finite or infinite, but there is a very convenient trick, the Ruzsa projection trick, which allows one to reduce the latter case to the former.  For instance, consider the set $A = \{1,\ldots,N\}$ inside the integers ${\Bbb Z}$.  The integers of course form an infinite group, but if we are only interested in sums of at most two elements of A at a time, we can embed A ininside the finite cyclic group ${\Bbb Z}/2N{\Bbb Z}$ without losing any combinatorial information.  More precisely, there is a Freiman isomorphism of order 2 between the set $\{1,\ldots,N\}$ in ${\Bbb Z}$ and the set $\{1,\ldots,N\}$ in ${\Bbb Z}/2N{\Bbb Z}$.  One can view the latter version of $\{1,\ldots,N\}$ as a model for the former version of $\{1,\ldots,N\}$. More generally, it turns out that any finite set A in an additive group can be modeled in the above set by an equivalent set in a finite group, and in fact one can ensure that this ambient modeling group is not much larger than A itself if A has some additive structure; see this paper of Ruzsa (or Lemma 5.26 of my book with Van Vu) for a precise statement.  This projection trick has a number of important uses in additive combinatorics, most notably in Ruzsa’s simplified proof of Freiman’s theorem.

Given the interest in non-commutative analogues of Freiman’s theorem, it is natural to ask whether one can similarly model finite sets A in multiplicative (and non-commutative) groups $G = (G,\times)$ using finite models.  Unfortunately (as I learned recently from Akshay Venkatesh, via Ben Green), this turns out to be impossible in general, due to an old example of Higman.  More precisely, Higman shows:

Theorem 1. There exists an infinite group G generated by four distinct elements a,b,c,d that obey the relations

$ab = ba^2; bc = cb^2; cd = dc^2; da = ad^2$; (1)

in fact, a and c generate the free group in G.  On the other hand, if G’ is a finite group containing four elements a,b,c,d obeying (1), then a,b,c,d are all trivial.

As a consequence, the finite set $A := \{ 1, a, b, c, d, ab, bc, cd, da \}$ in G has no model (in the sense of Freiman isomorphisms) in a finite group.

Theorem 1 is proven by a small amount of elementary group theory and number theory, and it was neat enough that I thought I would reproduce it here.

In the last few weeks, the Great Internet Mersenne Prime Search (GIMPS) announced the discovery of two new Mersenne primes, both over ten million digits in length, including one discovered by the computing team right here at UCLA (see this page for more information).  [I was not involved in this computing effort.]  As for the question “Why do we want to find such big primes anyway?”, see this page, though this is not the focus of my post today.

The GIMPS approach to finding Mersenne primes relies of course on modern computing power, parallelisation, and efficient programming, but the number-theoretic heart of it – aside from some basic optimisation tricks such as fast multiplication and preliminary sieving to eliminate some obviously non-prime Mersenne number candidates – is the Lucas-Lehmer primality test for Mersenne numbers, which is much faster for this special type of number than any known general-purpose (deterministic) primality test (such as, say, the AKS test).  This test is easy enough to describe, and I will do so later in this post, and also has some short elementary proofs of correctness; but the proofs are sometimes presented in a way that involves pulling a lot of rabbits out of hats, giving the argument a magical feel rather than a natural one.  In this post, I will try to explain the basic ideas that make the primality test work, seeking a proof which is perhaps less elementary and a little longer than some of the proofs in the literature, but is perhaps a bit better motivated.