You are currently browsing the category archive for the ‘math.OA’ category.

One of the basic tools in modern combinatorics is the probabilistic method, introduced by Erdos, in which a deterministic solution to a given problem is shown to exist by constructing a *random* candidate for a solution, and showing that this candidate solves all the requirements of the problem with positive probability. When the problem requires a real-valued statistic to be suitably large or suitably small, the following trivial observation is often employed:

Proposition 1 (Comparison with mean)Let be a random real-valued variable, whose mean (orfirst moment) is finite. Thenwith positive probability, and

with positive probability.

This proposition is usually applied in conjunction with a computation of the first moment , in which case this version of the probabilistic method becomes an instance of the *first moment method*. (For comparison with other moment methods, such as the second moment method, exponential moment method, and zeroth moment method, see Chapter 1 of my book with Van Vu. For a general discussion of the probabilistic method, see the book by Alon and Spencer of the same name.)

As a typical example in random matrix theory, if one wanted to understand how small or how large the operator norm of a random matrix could be, one might first try to compute the expected operator norm and then apply Proposition 1; see this previous blog post for examples of this strategy (and related strategies, based on comparing with more tractable expressions such as the moments ). (In this blog post, all matrices are complex-valued.)

Recently, in their proof of the Kadison-Singer conjecture (and also in their earlier paper on Ramanujan graphs), Marcus, Spielman, and Srivastava introduced an striking new variant of the first moment method, suited in particular for controlling the operator norm of a Hermitian positive semi-definite matrix . Such matrices have non-negative real eigenvalues, and so in this case is just the largest eigenvalue of . Traditionally, one tries to control the eigenvalues through averaged statistics such as moments or Stieltjes transforms ; again, see this previous blog post. Here we use as short-hand for , where is the identity matrix. Marcus, Spielman, and Srivastava instead rely on the interpretation of the eigenvalues of as the roots of the characteristic polynomial of , thus

where is the largest real root of a non-zero polynomial . (In our applications, we will only ever apply to polynomials that have at least one real root, but for sake of completeness let us set if has no real roots.)

Prior to the work of Marcus, Spielman, and Srivastava, I think it is safe to say that the conventional wisdom in random matrix theory was that the representation (1) of the operator norm was not particularly useful, due to the highly non-linear nature of both the characteristic polynomial map and the maximum root map . (Although, as pointed out to me by Adam Marcus, some related ideas have occurred in graph theory rather than random matrix theory, for instance in the theory of the matching polynomial of a graph.) For instance, a fact as basic as the triangle inequality is extremely difficult to establish through (1). Nevertheless, it turns out that for certain special types of random matrices (particularly those in which a typical instance of this ensemble has a simple relationship to “adjacent” matrices in this ensemble), the polynomials enjoy an extremely rich structure (in particular, they lie in families of real stable polynomials, and hence enjoy good combinatorial interlacing properties) that can be surprisingly useful. In particular, Marcus, Spielman, and Srivastava established the following nonlinear variant of Proposition 1:

Proposition 2 (Comparison with mean)Let . Let be a random matrix, which is the sum of independent Hermitian rank one matrices , each taking a finite number of values. Thenwith positive probability, and

with positive probability.

We prove this proposition below the fold. The hypothesis that each only takes finitely many values is technical and can likely be relaxed substantially, but we will not need to do so here. Despite the superficial similarity with Proposition 1, the proof of Proposition 2 is quite nonlinear; in particular, one needs the interlacing properties of real stable polynomials to proceed. Another key ingredient in the proof is the observation that while the determinant of a matrix generally behaves in a nonlinar fashion on the underlying matrix , it becomes (affine-)linear when one considers rank one perturbations, and so depends in an affine-multilinear fashion on the . More precisely, we have the following deterministic formula, also proven below the fold:

Proposition 3 (Deterministic multilinearisation formula)Let be the sum of deterministic rank one matrices . Then we havefor all , where the

mixed characteristic polynomialof any matrices (not necessarily rank one) is given by the formula

Among other things, this formula gives a useful representation of the mean characteristic polynomial :

Corollary 4 (Random multilinearisation formula)Let be the sum of jointly independent rank one matrices . Then we have

*Proof:* For fixed , the expression is a polynomial combination of the , while the differential operator is a linear combination of differential operators for . As a consequence, we may expand (3) as a linear combination of terms, each of which is a multilinear combination of for some . Taking expectations of both sides of (2) and using the joint independence of the , we obtain the claim.

In view of Proposition 2, we can now hope to control the operator norm of certain special types of random matrices (and specifically, the sum of independent Hermitian positive semi-definite rank one matrices) by first controlling the mean of the random characteristic polynomial . Pursuing this philosophy, Marcus, Spielman, and Srivastava establish the following result, which they then use to prove the Kadison-Singer conjecture:

Theorem 5 (Marcus-Spielman-Srivastava theorem)Let . Let be jointly independent random vectors in , with each taking a finite number of values. Suppose that we have the normalisationwhere we are using the convention that is the identity matrix whenever necessary. Suppose also that we have the smallness condition

for some and all . Then one has

Note that the upper bound in (5) must be at least (by taking to be deterministic) and also must be at least (by taking the to always have magnitude at least ). Thus the bound in (5) is asymptotically tight both in the regime and in the regime ; the latter regime will be particularly useful for applications to Kadison-Singer. It should also be noted that if one uses more traditional random matrix theory methods (based on tools such as Proposition 1, as well as more sophisticated variants of these tools, such as the concentration of measure results of Rudelson and Ahlswede-Winter), one obtains a bound of with high probability, which is insufficient for the application to the Kadison-Singer problem; see this article of Tropp. Thus, Theorem 5 obtains a sharper bound, at the cost of trading in “high probability” for “positive probability”.

In the paper of Marcus, Spielman and Srivastava, Theorem 5 is used to deduce a conjecture of Weaver, which was already known to imply the Kadison-Singer conjecture; actually, a slight modification of their argument gives the paving conjecture of Kadison and Singer, from which the original Kadison-Singer conjecture may be readily deduced. We give these implications below the fold. (See also this survey article for some background on the Kadison-Singer problem.)

Let us now summarise how Theorem 5 is proven. In the spirit of semi-definite programming, we rephrase the above theorem in terms of the rank one Hermitian positive semi-definite matrices :

Theorem 6 (Marcus-Spielman-Srivastava theorem again)Let be jointly independent random rank one Hermitian positive semi-definite matrices such that the sum has meanand such that

for some and all . Then one has

with positive probability.

In view of (1) and Proposition 2, this theorem follows from the following control on the mean characteristic polynomial:

Theorem 7 (Control of mean characteristic polynomial)Let be jointly independent random rank one Hermitian positive semi-definite matrices such that the sum has meanand such that

for some and all . Then one has

This result is proven using the multilinearisation formula (Corollary 4) and some convexity properties of real stable polynomials; we give the proof below the fold.

Thanks to Adam Marcus, Assaf Naor and Sorin Popa for many useful explanations on various aspects of the Kadison-Singer problem.

A finite group is said to be a Frobenius group if there is a non-trivial subgroup of (known as the *Frobenius complement* of ) such that the conjugates of are “disjoint as possible” in the sense that whenever . This gives a decomposition

where the *Frobenius kernel* of is defined as the identity element together with all the non-identity elements that are not conjugate to any element of . Taking cardinalities, we conclude that

A remarkable theorem of Frobenius gives an unexpected amount of structure on and hence on :

Theorem 1 (Frobenius’ theorem)Let be a Frobenius group with Frobenius complement and Frobenius kernel . Then is a normal subgroup of , and hence (by (2) and the disjointness of and outside the identity) is the semidirect product of and .

I discussed Frobenius’ theorem and its proof in this recent blog post. This proof uses the theory of characters on a finite group , in particular relying on the fact that a character on a subgroup can induce a character on , which can then be decomposed into irreducible characters with *natural number* coefficients. Remarkably, even though a century has passed since Frobenius’ original argument, there is no proof known of this theorem which avoids character theory entirely; there are elementary proofs known when the complement has even order or when is solvable (we review both of these cases below the fold), which by the Feit-Thompson theorem does cover all the cases, but the proof of the Feit-Thompson theorem involves plenty of character theory (and also relies on Theorem 1). (The answers to this MathOverflow question give a good overview of the current state of affairs.)

I have been playing around recently with the problem of finding a character-free proof of Frobenius’ theorem. I didn’t succeed in obtaining a completely elementary proof, but I did find an argument which replaces character theory (which can be viewed as coming from the representation theory of the non-commutative group algebra ) with the Fourier analysis of class functions (i.e. the representation theory of the centre of the group algebra), thus replacing non-commutative representation theory by commutative representation theory. This is not a particularly radical depature from the existing proofs of Frobenius’ theorem, but it did seem to be a new proof which was technically “character-free” (even if it was not all that far from character-based in spirit), so I thought I would record it here.

The main ideas are as follows. The space of class functions can be viewed as a commutative algebra with respect to the convolution operation ; as the regular representation is unitary and faithful, this algebra contains no nilpotent elements. As such, (Gelfand-style) Fourier analysis suggests that one can analyse this algebra through the idempotents: class functions such that . In terms of characters, idempotents are nothing more than sums of the form for various collections of characters, but we can perform a fair amount of analysis on idempotents directly without recourse to characters. In particular, it turns out that idempotents enjoy some important integrality properties that can be established without invoking characters: for instance, by taking traces one can check that is a natural number, and more generally we will show that is a natural number whenever is a subgroup of (see Corollary 4 below). For instance, the quantity

is a natural number which we will call the *rank* of (as it is also the linear rank of the transformation on ).

In the case that is a Frobenius group with kernel , the above integrality properties can be used after some elementary manipulations to establish that for any idempotent , the quantity

is an integer. On the other hand, one can also show by elementary means that this quantity lies between and . These two facts are not strong enough on their own to impose much further structure on , unless one restricts attention to *minimal* idempotents . In this case spectral theory (or Gelfand theory, or the fundamental theorem of algebra) tells us that has rank one, and then the *integrality gap* comes into play and forces the quantity (3) to always be either zero or one. This can be used to imply that the convolution action of every minimal idempotent either preserves or annihilates it, which makes itself an idempotent, which makes normal.

One of the basic problems in the field of operator algebras is to develop a functional calculus for either a single operator , or a collection of operators. These operators could in principle act on any function space, but typically one either considers complex matrices (which act on a complex finite dimensional space), or operators (either bounded or unbounded) on a complex Hilbert space. (One can of course also obtain analogous results for real operators, but we will work throughout with complex operators in this post.)

Roughly speaking, a functional calculus is a way to assign an operator or to any function in a suitable function space, which is linear over the complex numbers, preserve the scalars (i.e. when ), and should be either an exact or approximate homomorphism in the sense that

should hold either exactly or approximately. In the case when the are self-adjoint operators acting on a Hilbert space (or Hermitian matrices), one often also desires the identity

to also hold either exactly or approximately. (Note that one cannot reasonably expect (1) and (2) to hold exactly for all if the and their adjoints do not commute with each other, so in those cases one has to be willing to allow some error terms in the above wish list of properties of the calculus.) Ideally, one should also be able to relate the operator norm of or with something like the uniform norm on . In principle, the existence of a good functional calculus allows one to manipulate operators as if they were scalars (or at least approximately as if they were scalars), which is very helpful for a number of applications, such as partial differential equations, spectral theory, noncommutative probability, and semiclassical mechanics. A functional calculus for multiple operators can be particularly valuable as it allows one to treat as being exact or approximate scalars *simultaneously*. For instance, if one is trying to solve a linear differential equation that can (formally at least) be expressed in the form

for some data , unknown function , some differential operators , and some nice function , then if one’s functional calculus is good enough (and is suitably “elliptic” in the sense that it does not vanish or otherwise degenerate too often), one should be able to solve this equation either exactly or approximately by the formula

which is of course how one would solve this equation if one pretended that the operators were in fact scalars. Formalising this calculus rigorously leads to the theory of pseudodifferential operators, which allows one to (approximately) solve or at least simplify a much wider range of differential equations than one what can achieve with more elementary algebraic transformations (e.g. integrating factors, change of variables, variation of parameters, etc.). In quantum mechanics, a functional calculus that allows one to treat operators as if they were approximately scalar can be used to rigorously justify the correspondence principle in physics, namely that the predictions of quantum mechanics approximate that of classical mechanics in the *semiclassical limit* .

There is no universal functional calculus that works in all situations; the strongest functional calculi, which are close to being an exact *-homomorphisms on very large class of functions, tend to only work for under very restrictive hypotheses on or (in particular, when , one needs the to commute either exactly, or very close to exactly), while there are weaker functional calculi which have fewer nice properties and only work for a very small class of functions, but can be applied to quite general operators or . In some cases the functional calculus is only formal, in the sense that or has to be interpreted as an infinite formal series that does not converge in a traditional sense. Also, when one wishes to select a functional calculus on non-commuting operators , there is a certain amount of non-uniqueness: one generally has a number of slightly different functional calculi to choose from, which generally have the same properties but differ in some minor technical details (particularly with regards to the behaviour of “lower order” components of the calculus). This is a similar to how one has a variety of slightly different coordinate systems available to parameterise a Riemannian manifold or Lie group. This is on contrast to the case when the underlying operator is (essentially) normal (so that commutes with ); in this special case (which includes the important subcases when is unitary or (essentially) self-adjoint), spectral theory gives us a canonical and very powerful functional calculus which can be used without further modification in applications.

Despite this lack of uniqueness, there is one standard choice for a functional calculus available for general operators , namely the Weyl functional calculus; it is analogous in some ways to normal coordinates for Riemannian manifolds, or *exponential coordinates of the first kind* for Lie groups, in that it treats lower order terms in a reasonably nice fashion. (But it is important to keep in mind that, like its analogues in Riemannian geometry or Lie theory, there will be some instances in which the Weyl calculus is not the optimal calculus to use for the application at hand.)

I decided to write some notes on the Weyl functional calculus (also known as Weyl quantisation), and to sketch the applications of this calculus both to the theory of pseudodifferential operators. They are mostly for my own benefit (so that I won’t have to redo these particular calculations again), but perhaps they will also be of interest to some readers here. (Of course, this material is also covered in many other places. e.g. Folland’s “harmonic analysis in phase space“.)

Let be two Hermitian matrices. When and commute, we have the identity

When and do not commute, the situation is more complicated; we have the *Baker-Campbell-Hausdorff formula*

where the infinite product here is explicit but very messy. On the other hand, taking determinants we still have the identity

Recently I learned (from Emmanuel Candes, who in turn learned it from David Gross) that there is another very nice relationship between and , namely the Golden-Thompson inequality

The remarkable thing about this inequality is that no commutativity hypotheses whatsoever on the matrices are required. Note that the right-hand side can be rearranged using the cyclic property of trace as ; the expression inside the trace is positive definite so the right-hand side is positive. (On the other hand, there is no reason why expressions such as need to be positive or even real, so the obvious extension of the Golden-Thompson inequality to three or more Hermitian matrices fails.) I am told that this inequality is quite useful in statistical mechanics, although I do not know the details of this.

To get a sense of how delicate the Golden-Thompson inequality is, let us expand both sides to fourth order in . The left-hand side expands as

while the right-hand side expands as

Using the cyclic property of trace , one can verify that all terms up to third order agree. Turning to the fourth order terms, one sees after expanding out and using the cyclic property of trace as much as possible, we see that the fourth order terms *almost* agree, but the left-hand side contains a term whose counterpart on the right-hand side is . The difference between the two can be factorised (again using the cyclic property of trace) as . Since is skew-Hermitian, is positive definite, and so we have proven the Golden-Thompson inequality to fourth order. (One could also have used the Cauchy-Schwarz inequality for the Frobenius norm to establish this; see below.)

Intuitively, the Golden-Thompson inequality is asserting that interactions between a pair of non-commuting Hermitian matrices are strongest when cross-interactions are kept to a minimum, so that all the factors lie on one side of a product and all the factors lie on the other. Indeed, this theme will be running through the proof of this inequality, to which we now turn.

In this final set of lecture notes for this course, we leave the realm of self-adjoint matrix ensembles, such as Wigner random matrices, and consider instead the simplest examples of non-self-adjoint ensembles, namely the iid matrix ensembles. (I had also hoped to discuss recent progress in eigenvalue spacing distributions of Wigner matrices, but have run out of time. For readers interested in this topic, I can recommend the recent Bourbaki exposé of Alice Guionnet.)

The basic result in this area is

Theorem 1 (Circular law)Let be an iid matrix, whose entries , are iid with a fixed (complex) distribution of mean zero and variance one. Then the spectral measure converges both in probability and almost surely to the circular law , where are the real and imaginary coordinates of the complex plane.

This theorem has a long history; it is analogous to the semi-circular law, but the non-Hermitian nature of the matrices makes the spectrum so unstable that key techniques that are used in the semi-circular case, such as truncation and the moment method, no longer work; significant new ideas are required. In the case of random gaussian matrices, this result was established by Mehta (in the complex case) and by Edelman (in the real case), as was sketched out in Notes. In 1984, Girko laid out a general strategy for establishing the result for non-gaussian matrices, which formed the base of all future work on the subject; however, a key ingredient in the argument, namely a bound on the least singular value of shifts , was not fully justified at the time. A rigorous proof of the circular law was then established by Bai, assuming additional moment and boundedness conditions on the individual entries. These additional conditions were then slowly removed in a sequence of papers by Gotze-Tikhimirov, Girko, Pan-Zhou, and Tao-Vu, with the last moment condition being removed in a paper of myself, Van Vu, and Manjunath Krishnapur.

At present, the known methods used to establish the circular law for general ensembles rely very heavily on the joint independence of all the entries. It is a key challenge to see how to weaken this joint independence assumption.

In the foundations of modern probability, as laid out by Kolmogorov, the basic objects of study are constructed in the following order:

- Firstly, one selects a sample space , whose elements represent all the possible states that one’s stochastic system could be in.
- Then, one selects a -algebra of events (modeled by subsets of ), and assigns each of these events a probability in a countably additive manner, so that the entire sample space has probability .
- Finally, one builds (commutative) algebras of random variables (such as complex-valued random variables, modeled by measurable functions from to ), and (assuming suitable integrability or moment conditions) one can assign expectations to each such random variable.

In measure theory, the underlying measure space plays a prominent foundational role, with the measurable sets and measurable functions (the analogues of the events and the random variables) always being viewed as somehow being attached to that space. In probability theory, in contrast, it is the events and their probabilities that are viewed as being fundamental, with the sample space being abstracted away as much as possible, and with the random variables and expectations being viewed as derived concepts. See Notes 0 for further discussion of this philosophy.

However, it is possible to take the abstraction process one step further, and view the *algebra of random variables and their expectations* as being the foundational concept, and ignoring both the presence of the original sample space, the algebra of events, or the probability measure.

There are two reasons for wanting to shed (or abstract away) these previously foundational structures. Firstly, it allows one to more easily take certain types of limits, such as the large limit when considering random matrices, because quantities built from the algebra of random variables and their expectations, such as the normalised moments of random matrices tend to be quite stable in the large limit (as we have seen in previous notes), even as the sample space and event space varies with . (This theme of using abstraction to facilitate the taking of the large limit also shows up in the application of ergodic theory to combinatorics via the correspondence principle; see this previous blog post for further discussion.)

Secondly, this abstract formalism allows one to generalise the classical, commutative theory of probability to the more general theory of *non-commutative probability theory*, which does not have a classical underlying sample space or event space, but is instead built upon a (possibly) *non-commutative* algebra of random variables (or “observables”) and their expectations (or “traces”). This more general formalism not only encompasses classical probability, but also spectral theory (with matrices or operators taking the role of random variables, and the trace taking the role of expectation), random matrix theory (which can be viewed as a natural blend of classical probability and spectral theory), and quantum mechanics (with physical observables taking the role of random variables, and their expected value on a given quantum state being the expectation). It is also part of a more general “non-commutative way of thinking” (of which non-commutative geometry is the most prominent example), in which a space is understood primarily in terms of the ring or algebra of functions (or function-like objects, such as sections of bundles) placed on top of that space, and then the space itself is largely abstracted away in order to allow the algebraic structures to become less commutative. In short, the idea is to make *algebra* the foundation of the theory, as opposed to other possible choices of foundations such as sets, measures, categories, etc..

[Note that this foundational preference is to some extent a metamathematical one rather than a mathematical one; in many cases it is possible to rewrite the theory in a mathematically equivalent form so that some other mathematical structure becomes designated as the foundational one, much as probability theory can be equivalently formulated as the measure theory of probability measures. However, this does not negate the fact that a different choice of foundations can lead to a different way of thinking about the subject, and thus to ask a different set of questions and to discover a different set of proofs and solutions. Thus it is often of value to understand multiple foundational perspectives at once, to get a truly stereoscopic view of the subject.]

It turns out that non-commutative probability can be modeled using operator algebras such as -algebras, von Neumann algebras, or algebras of bounded operators on a Hilbert space, with the latter being accomplished via the Gelfand-Naimark-Segal construction. We will discuss some of these models here, but just as probability theory seeks to abstract away its measure-theoretic models, the philosophy of non-commutative probability is also to downplay these operator algebraic models once some foundational issues are settled.

When one generalises the set of structures in one’s theory, for instance from the commutative setting to the non-commutative setting, the notion of what it means for a structure to be “universal”, “free”, or “independent” can change. The most familiar example of this comes from group theory. If one restricts attention to the category of abelian groups, then the “freest” object one can generate from two generators is the free abelian group of commutative words with , which is isomorphic to the group . If however one generalises to the non-commutative setting of arbitrary groups, then the “freest” object that can now be generated from two generators is the free group of non-commutative words with , which is a significantly larger extension of the free abelian group .

Similarly, when generalising classical probability theory to non-commutative probability theory, the notion of what it means for two or more random variables to be independent changes. In the classical (commutative) setting, two (bounded, real-valued) random variables are independent if one has

whenever are well-behaved functions (such as polynomials) such that all of , vanishes. In the non-commutative setting, one can generalise the above definition to two *commuting* bounded self-adjoint variables; this concept is useful for instance in quantum probability, which is an abstraction of the theory of observables in quantum mechanics. But for two (bounded, self-adjoint) *non-commutative* random variables , the notion of classical independence no longer applies. As a substitute, one can instead consider the notion of being freely independent (or *free* for short), which means that

whenever are well-behaved functions such that all of vanish.

The concept of free independence was introduced by Voiculescu, and its study is now known as the subject of free probability. We will not attempt a systematic survey of this subject here; for this, we refer the reader to the surveys of Speicher and of Biane. Instead, we shall just discuss a small number of topics in this area to give the flavour of the subject only.

The significance of free probability to random matrix theory lies in the fundamental observation that random matrices which are independent in the classical sense, also tend to be independent in the free probability sense, in the large limit . (This is only possible because of the highly non-commutative nature of these matrices; as we shall see, it is not possible for non-trivial commuting independent random variables to be freely independent.) Because of this, many tedious computations in random matrix theory, particularly those of an algebraic or enumerative combinatorial nature, can be done more quickly and systematically by using the framework of free probability, which by design is optimised for algebraic tasks rather than analytical ones.

Much as free groups are in some sense “maximally non-commutative”, freely independent random variables are about as far from being commuting as possible. For instance, if are freely independent and of expectation zero, then vanishes, but instead factors as . As a consequence, the behaviour of freely independent random variables can be quite different from the behaviour of their classically independent commuting counterparts. Nevertheless there is a remarkably strong *analogy* between the two types of independence, in that results which are true in the classically independent case often have an interesting analogue in the freely independent setting. For instance, the central limit theorem (Notes 2) for averages of classically independent random variables, which roughly speaking asserts that such averages become gaussian in the large limit, has an analogue for averages of freely independent variables, the *free central limit theorem*, which roughly speaking asserts that such averages become *semicircular* in the large limit. One can then use this theorem to provide yet another proof of Wigner’s semicircle law (Notes 4).

Another important (and closely related) analogy is that while the distribution of sums of independent commutative random variables can be quickly computed via the characteristic function (i.e. the Fourier transform of the distribution), the distribution of sums of freely independent non-commutative random variables can be quickly computed using the Stieltjes transform instead (or with closely related objects, such as the *-transform* of Voiculescu). This is strongly reminiscent of the appearance of the Stieltjes transform in random matrix theory, and indeed we will see many parallels between the use of the Stieltjes transform here and in Notes 4.

As mentioned earlier, free probability is an excellent tool for computing various expressions of interest in random matrix theory, such as asymptotic values of normalised moments in the large limit . Nevertheless, as it only covers the asymptotic regime in which is sent to infinity while holding all other parameters fixed, there are some aspects of random matrix theory to which the tools of free probability are not sufficient by themselves to resolve (although it can be possible to combine free probability theory with other tools to then answer these questions). For instance, questions regarding the *rate* of convergence of normalised moments as are not directly answered by free probability, though if free probability is combined with tools such as concentration of measure (Notes 1) then such rate information can often be recovered. For similar reasons, free probability lets one understand the behaviour of moments as for *fixed* , but has more difficulty dealing with the situation in which is allowed to grow slowly in (e.g. ). Because of this, free probability methods are effective at controlling the *bulk* of the spectrum of a random matrix, but have more difficulty with the *edges* of that spectrum (as well as with related concepts such as the operator norm, Notes 3) as well as with fine-scale structure of the spectrum. Finally, free probability methods are most effective when dealing with matrices that are Hermitian with bounded operator norm, largely because the spectral theory of bounded self-adjoint operators in the infinite-dimensional setting of the large limit is non-pathological. (This is ultimately due to the stable nature of eigenvalues in the self-adjoint setting; see this previous blog post for discussion.) For non-self-adjoint operators, free probability needs to be augmented with additional tools, most notably by bounds on least singular values, in order to recover the required stability for the various spectral data of random matrices to behave continuously with respect to the large limit. We will discuss this latter point in a later set of notes.

Tim Austin, Tanja Eisner, and I have just uploaded to the arXiv our joint paper Nonconventional ergodic averages and multiple recurrence for von Neumann dynamical systems, submitted to Pacific Journal of Mathematics. This project started with the observation that the multiple recurrence theorem of Furstenberg (and the related multiple convergence theorem of Host and Kra) could be interpreted in the language of dynamical systems of commutative finite von Neumann algebras, which naturally raised the question of the extent to which the results hold in the noncommutative setting. The short answer is “yes for small averages, but not for long ones”.

The Furstenberg multiple recurrence theorem can be phrased as follows: if is a probability space with a measure-preserving shift (which naturally induces an isomorphism by setting ), is non-negative with positive trace , and is an integer, then one has

In particular, for all in a set of positive upper density. This result is famously equivalent to Szemerédi’s theorem on arithmetic progressions.

The Host-Kra multiple convergence theorem makes the related assertion that if , then the scalar averages

converge to a limit as ; *a fortiori*, the function averages

converge in (say) norm.

The space is a commutative example of a von Neumann algebra: an algebra of bounded linear operators on a complex Hilbert space which is closed under the weak operator topology, and under taking adjoints. Indeed, one can take to be , and identify each element of with the multiplier operator . The operation is then a *finite trace* for this algebra, i.e. a linear map from the algebra to the scalars such that , , and , with equality iff . The shift is then an automorphism of this algebra (preserving shift and conjugation).

We can generalise this situation to the noncommutative setting. Define a *von Neumann dynamical system* to be a von Neumann algebra with a finite trace and an automorphism . In addition to the commutative examples generated by measure-preserving systems, we give three other examples here:

- (Matrices) is the algebra of complex matrices, with trace and shift , where is a fixed unitary matrix.
- (Group algebras) is the closure of the
*group algebra*of a discrete group (i.e. the algebra of finite formal complex combinations of group elements), which acts on the Hilbert space by convolution (identifying each group element with its Kronecker delta function). A trace is given by , where is the Kronecker delta at the identity. Any automorphism of the group induces a shift . - (Noncommutative torus) is the von Neumann algebra acting on generated by the multiplier operator and the shifted multiplier operator , where is fixed. A trace is given by , where is the constant function.

Inspired by noncommutative generalisations of other results in commutative analysis, one can then ask the following questions, for a fixed and for a fixed von Neumann dynamical system :

- (Recurrence on average) Whenever is non-negative with positive trace, is it true that
- (Recurrence on a dense set) Whenever is non-negative with positive trace, is it true thatfor all in a set of positive upper density?
- (Weak convergence) With , is it true thatconverges?
- (Strong convergence) With , is it true thatconverges in using the Hilbert-Schmidt norm ?

Note that strong convergence automatically implies weak convergence, and recurrence on average automatically implies recurrence on a dense set.

For , all four questions can trivially be answered “yes”. For , the answer to the above four questions is also “yes”, thanks to the von Neumann ergodic theorem for unitary operators. For , we were able to establish a positive answer to the “recurrence on a dense set”, “weak convergence”, and “strong convergence” results assuming that is ergodic. For general , we have a positive answer to all four questions under the assumption that is *asymptotically abelian*, which roughly speaking means that the commutators converges to zero (in an appropriate weak sense) as . Both of these proofs adapt the usual ergodic theory arguments; the latter result generalises some earlier work of Niculescu-Stroh-Zsido, Duvenhage, and Beyers-Duvenhage-Stroh. For the result, a key observation is that the van der Corput lemma can be used to control triple averages without requiring any commutativity; the “generalised von Neumann” trick of using multiple applications of the van der Corput trick to control higher averages, however, relies much more strongly on commutativity.

In most other situations we have counterexamples to all of these questions. In particular:

- For , recurrence on average can fail on an ergodic system; indeed, one can even make the average
*negative*. This example is ultimately based on a Behrend example construction and a von Neumann algebra construction known as the*crossed product*. - For , recurrence on a dense set can also fail if the ergodicity hypothesis is dropped. This also uses the Behrend example and the crossed product construction.
- For , weak and strong convergence can fail even assuming ergodicity. This uses a group theoretic construction, which amusingly was inspired by Grothendieck’s interpretation of a group as a sheaf of flat connections, which I blogged about recently, and which I will discuss below the fold.
- For , recurrence on a dense set fails even with the ergodicity hypothesis. This uses a fancier version of the Behrend example due to Ruzsa in this paper of Bergelson, Host, and Kra. This example only applies for ; we do not know for whether recurrence on a dense set holds for ergodic systems.

In these notes we lay out the basic theory of the Fourier transform, which is of course the most fundamental tool in harmonic analysis and also of major importance in related fields (functional analysis, complex analysis, PDE, number theory, additive combinatorics, representation theory, signal processing, etc.). The Fourier transform, in conjunction with the Fourier inversion formula, allows one to take essentially arbitrary (complex-valued) functions on a group (or more generally, a space that acts on, e.g. a homogeneous space ), and decompose them as a (discrete or continuous) superposition of much more symmetric functions on the domain, such as characters ; the precise superposition is given by *Fourier coefficients* , which take values in some dual object such as the Pontryagin dual of . Characters behave in a very simple manner with respect to translation (indeed, they are eigenfunctions of the translation action), and so the Fourier transform tends to simplify any mathematical problem which enjoys a translation invariance symmetry (or an approximation to such a symmetry), and is somehow “linear” (i.e. it interacts nicely with superpositions). In particular, Fourier analytic methods are particularly useful for studying operations such as convolution and set-theoretic addition , or the closely related problem of counting solutions to additive problems such as or , where are constrained to lie in specific sets . The Fourier transform is also a particularly powerful tool for solving constant-coefficient linear ODE and PDE (because of the translation invariance), and can also approximately solve some variable-coefficient (or slightly non-linear) equations if the coefficients vary smoothly enough and the nonlinear terms are sufficiently tame.

The Fourier transform also provides an important new way of looking at a function , as it highlights the distribution of in *frequency space* (the domain of the frequency variable ) rather than *physical space* (the domain of the physical variable ). A given property of in the physical domain may be transformed to a rather different-looking property of in the frequency domain. For instance:

- Smoothness of in the physical domain corresponds to decay of in the Fourier domain, and conversely. (More generally, fine scale properties of tend to manifest themselves as coarse scale properties of , and conversely.)
- Convolution in the physical domain corresponds to pointwise multiplication in the Fourier domain, and conversely.
- Constant coefficient differential operators such as in the physical domain corresponds to multiplication by polynomials such as in the Fourier domain, and conversely.
- More generally, translation invariant operators in the physical domain correspond to multiplication by symbols in the Fourier domain, and conversely.
- Rescaling in the physical domain by an invertible linear transformation corresponds to an inverse (adjoint) rescaling in the Fourier domain.
- Restriction to a subspace (or subgroup) in the physical domain corresponds to projection to the dual quotient space (or quotient group) in the Fourier domain, and conversely.
- Frequency modulation in the physical domain corresponds to translation in the frequency domain, and conversely.

(We will make these statements more precise below.)

On the other hand, some operations in the physical domain remain essentially unchanged in the Fourier domain. Most importantly, the norm (or *energy*) of a function is the same as that of its Fourier transform, and more generally the inner product of two functions is the same as that of their Fourier transforms. Indeed, the Fourier transform is a unitary operator on (a fact which is variously known as the *Plancherel theorem* or the *Parseval identity*). This makes it easier to pass back and forth between the physical domain and frequency domain, so that one can combine techniques that are easy to execute in the physical domain with other techniques that are easy to execute in the frequency domain. (In fact, one can combine the physical and frequency domains together into a product domain known as phase space, and there are entire fields of mathematics (e.g. microlocal analysis, geometric quantisation, time-frequency analysis) devoted to performing analysis on these sorts of spaces directly, but this is beyond the scope of this course.)

In these notes, we briefly discuss the general theory of the Fourier transform, but will mainly focus on the two classical domains for Fourier analysis: the torus , and the Euclidean space . For these domains one has the advantage of being able to perform very explicit algebraic calculations, involving concrete functions such as plane waves or Gaussians .

One way to study a general class of mathematical objects is to embed them into a more structured class of mathematical objects; for instance, one could study manifolds by embedding them into Euclidean spaces. In these (optional) notes we study two (related) embedding theorems for topological spaces:

- The Stone-Čech compactification, which embeds locally compact Hausdorff spaces into compact Hausdorff spaces in a “universal” fashion; and
- The Urysohn metrization theorem, that shows that every second-countable normal Hausdorff space is metrizable.

## Recent Comments