You are currently browsing the category archive for the ‘math.OA’ category.

The (classical) Möbius function is the unique function that obeys the classical Möbius inversion formula:

Proposition 1 (Classical Möbius inversion)Let be functions from the natural numbers to an additive group . Then the following two claims are equivalent:

- (i) for all .
- (ii) for all .

There is a generalisation of this formula to (finite) posets, due to Hall, in which one sums over chains in the poset:

Proposition 2 (Poset Möbius inversion)Let be a finite poset, and let be functions from that poset to an additive group . Then the following two claims are equivalent:(Note from the finite nature of that the inner sum in (ii) is vacuous for all but finitely many .)

- (i) for all , where is understood to range in .
- (ii) for all , where in the inner sum are understood to range in with the indicated ordering.

Comparing Proposition 2 with Proposition 1, it is natural to refer to the function as the Möbius function of the poset; the condition (ii) can then be written as

*Proof:*If (i) holds, then we have for any . Iterating this we obtain (ii). Conversely, from (ii) and separating out the term, and grouping all the other terms based on the value of , we obtain (1), and hence (i).

In fact it is not completely necessary that the poset be finite; an inspection of the proof shows that it suffices that every element of the poset has only finitely many predecessors .

It is not difficult to see that Proposition 2 includes Proposition 1 as a special case, after verifying the combinatorial fact that the quantity

is equal to when divides , and vanishes otherwise.I recently discovered that Proposition 2 can also lead to a useful variant of the inclusion-exclusion principle. The classical version of this principle can be phrased in terms of indicator functions: if are subsets of some set , then

In particular, if there is a finite measure on for which are all measurable, we haveOne drawback of this formula is that there are exponentially many terms on the right-hand side: of them, in fact. However, in many cases of interest there are “collisions” between the intersections (for instance, perhaps many of the pairwise intersections agree), in which case there is an opportunity to collect terms and hopefully achieve some cancellation. It turns out that it is possible to use Proposition 2 to do this, in which one only needs to sum over chains in the resulting poset of intersections:

Proposition 3 (Hall-type inclusion-exclusion principle)Let be subsets of some set , and let be the finite poset formed by intersections of some of the (with the convention that is the empty intersection), ordered by set inclusion. Then for any , one has where are understood to range in . In particular (setting to be the empty intersection) if the are all proper subsets of then we have In particular, if there is a finite measure on for which are all measurable, we have

Using the Möbius function on the poset , one can write these formulae as

and
*Proof:* It suffices to establish (2) (to derive (3) from (2) observe that all the are contained in one of the , so the effect of may be absorbed into ). Applying Proposition 2, this is equivalent to the assertion that

Example 4If with , and are all distinct, then we have for any finite measure on that makes measurable that due to the four chains , , , of length one, and the three chains , , of length two. Note that this expansion just has six terms in it, as opposed to the given by the usual inclusion-exclusion formula, though of course one can reduce the number of terms by combining the factors. This may not seem particularly impressive, especially if one views the term as really being three terms instead of one, but if we add a fourth set with for all , the formula now becomes and we begin to see more cancellation as we now have just seven terms (or ten if we count as four terms) instead of terms.

Example 5 (Variant of Legendre sieve)If are natural numbers, and is some sequence of complex numbers with only finitely many terms non-zero, then by applying the above proposition to the sets and with equal to counting measure weighted by the we obtain a variant of the Legendre sieve where range over the set formed by taking least common multiples of the (with the understanding that the empty least common multiple is ), and denotes the assertion that divides but is strictly less than . I am curious to know of this version of the Legendre sieve already appears in the literature (and similarly for the other applications of Proposition 2 given here).

If the poset has bounded depth then the number of terms in Proposition 3 can end up being just polynomially large in rather than exponentially large. Indeed, if all chains in have length at most then the number of terms here is at most . (The examples (4), (5) are ones in which the depth is equal to two.) I hope to report in a later post on how this version of inclusion-exclusion with polynomially many terms can be useful in an application.

Actually in our application we need an abstraction of the above formula, in which the indicator functions are replaced by more abstract idempotents:

Proposition 6 (Hall-type inclusion-exclusion principle for idempotents)Let be pairwise commuting elements of some ring with identity, which are all idempotent (thus for ). Let be the finite poset formed by products of the (with the convention that is the empty product), ordered by declaring when (note that all the elements of are idempotent so this is a partial ordering). Then for any , one has where are understood to range in . In particular (setting ) if all the are not equal to then we have

Morally speaking this proposition is equivalent to the previous one after applying a “spectral theorem” to simultaneously diagonalise all of the , but it is quicker to just adapt the previous proof to establish this proposition directly. Using the Möbius function for , we can rewrite these formulae as

and
*Proof:* Again it suffices to verify (6). Using Proposition 2 as before, it suffices to show that

Asgar Jamneshan and I have just uploaded to the arXiv our paper “Foundational aspects of uncountable measure theory: Gelfand duality, Riesz representation, canonical models, and canonical disintegration“. This paper arose from our longer-term project to systematically develop “uncountable” ergodic theory – ergodic theory in which the groups acting are not required to be countable, the probability spaces one acts on are not required to be standard Borel, or Polish, and the compact groups that arise in the structural theory (e.g., the theory of group extensions) are not required to be separable. One of the motivations of doing this is to allow ergodic theory results to be applied to ultraproducts of finite dynamical systems, which can then hopefully be transferred to establish combinatorial results with good uniformity properties. An instance of this is the uncountable Mackey-Zimmer theorem, discussed in this companion blog post.

In the course of this project, we ran into the obstacle that many foundational results, such as the Riesz representation theorem, often require one or more of these countability hypotheses when encountered in textbooks. Other technical issues also arise in the uncountable setting, such as the need to distinguish the Borel -algebra from the (two different types of) Baire -algebra. As such we needed to spend some time reviewing and synthesizing the known literature on some foundational results of “uncountable” measure theory, which led to this paper. As such, most of the results of this paper are already in the literature, either explicitly or implicitly, in one form or another (with perhaps the exception of the canonical disintegration, which we discuss below); we view the main contribution of this paper as presenting the results in a coherent and unified fashion. In particular we found that the language of category theory was invaluable in clarifying and organizing all the different results. In subsequent work we (and some other authors) will use the results in this paper for various applications in uncountable ergodic theory.

The foundational results covered in this paper can be divided into a number of subtopics (Gelfand duality, Baire -algebras and Riesz representation, canonical models, and canonical disintegration), which we discuss further below the fold.

Dimitri Shlyakhtenko and I have uploaded to the arXiv our paper Fractional free convolution powers. For me, this project (which we started during the 2018 IPAM program on quantitative linear algebra) was motivated by a desire to understand the behavior of the *minor process* applied to a large random Hermitian matrix , in which one takes the successive upper left minors of and computes their eigenvalues in non-decreasing order. These eigenvalues are related to each other by the Cauchy interlacing inequalities

*Gelfand-Tsetlin pattern*, as discussed in these previous blog posts.

When is large and the matrix is a random matrix with empirical spectral distribution converging to some compactly supported probability measure on the real line, then under suitable hypotheses (e.g., unitary conjugation invariance of the random matrix ensemble ), a “concentration of measure” effect occurs, with the spectral distribution of the minors for for any fixed converging to a specific measure that depends only on and . The reason for this notation is that there is a surprising description of this measure when is a natural number, namely it is the free convolution of copies of , pushed forward by the dilation map . For instance, if is the Wigner semicircular measure , then . At the random matrix level, this reflects the fact that the minor of a GUE matrix is again a GUE matrix (up to a renormalizing constant).

As first observed by Bercovici and Voiculescu and developed further by Nica and Speicher, among other authors, the notion of a free convolution power of can be extended to non-integer , thus giving the notion of a “fractional free convolution power”. This notion can be defined in several different ways. One of them proceeds via the Cauchy transform

of the measure , and can be defined by solving the Burgers-type equation with initial condition (see this previous blog post for a derivation). This equation can be solved explicitly using the*-transform*of , defined by solving the equation for sufficiently large , in which case one can show that (In the case of the semicircular measure , the -transform is simply the identity: .)

Nica and Speicher also gave a free probability interpretation of the fractional free convolution power: if is a noncommutative random variable in a noncommutative probability space with distribution , and is a real projection operator free of with trace , then the “minor” of (viewed as an element of a new noncommutative probability space whose elements are minors , with trace ) has the law of (we give a self-contained proof of this in an appendix to our paper). This suggests that the minor process (or fractional free convolution) can be studied within the framework of free probability theory.

One of the known facts about integer free convolution powers is monotonicity of the *free entropy*

*free Fisher information*which were introduced by Voiculescu as free probability analogues of the classical probability concepts of differential entropy and classical Fisher information. (Here we correct a small typo in the normalization constant of Fisher entropy as presented in Voiculescu’s paper.) Namely, it was shown by Shylakhtenko that the quantity is monotone non-decreasing for integer , and the Fisher information is monotone non-increasing for integer . This is the free probability analogue of the corresponding monotonicities for differential entropy and classical Fisher information that was established by Artstein, Ball, Barthe, and Naor, answering a question of Shannon.

Our first main result is to extend the monotonicity results of Shylakhtenko to fractional . We give two proofs of this fact, one using free probability machinery, and a more self contained (but less motivated) proof using integration by parts and contour integration. The free probability proof relies on the concept of the *free score* of a noncommutative random variable, which is the analogue of the classical score. The free score, also introduced by Voiculescu, can be defined by duality as measuring the perturbation with respect to semicircular noise, or more precisely

The free score interacts very well with the free minor process , in particular by standard calculations one can establish the identity

whenever is a noncommutative random variable, is an algebra of noncommutative random variables, and is a real projection of trace that is free of both and . The monotonicity of free Fisher information then follows from an application of Pythagoras’s theorem (which implies in particular that conditional expectation operators are contractions on ). The monotonicity of free entropy then follows from an integral representation of free entropy as an integral of free Fisher information along the free Ornstein-Uhlenbeck process (or equivalently, free Fisher information is essentially the rate of change of free entropy with respect to perturbation by semicircular noise). The argument also shows when equality holds in the monotonicity inequalities; this occurs precisely when is a semicircular measure up to affine rescaling.After an extensive amount of calculation of all the quantities that were implicit in the above free probability argument (in particular computing the various terms involved in the application of Pythagoras’ theorem), we were able to extract a self-contained proof of monotonicity that relied on differentiating the quantities in and using the differential equation (1). It turns out that if for sufficiently regular , then there is an identity

where is the kernel and . It is not difficult to show that is a positive semi-definite kernel, which gives the required monotonicity. It would be interesting to obtain some more insightful interpretation of the kernel and the identity (2).These monotonicity properties hint at the minor process being associated to some sort of “gradient flow” in the parameter. We were not able to formalize this intuition; indeed, it is not clear what a gradient flow on a varying noncommutative probability space even means. However, after substantial further calculation we were able to formally describe the minor process as the Euler-Lagrange equation for an intriguing Lagrangian functional that we conjecture to have a random matrix interpretation. We first work in “Lagrangian coordinates”, defining the quantity on the “Gelfand-Tsetlin pyramid”

by the formula which is well defined if the density of is sufficiently well behaved. The random matrix interpretation of is that it is the asymptotic location of the eigenvalue of the upper left minor of a random matrix with asymptotic empirical spectral distribution and with unitarily invariant distribution, thus is in some sense a continuum limit of Gelfand-Tsetlin patterns. Thus for instance the Cauchy interlacing laws in this asymptotic limit regime become After a lengthy calculation (involving extensive use of the chain rule and product rule), the equation (1) is equivalent to the Euler-Lagrange equation where is the Lagrangian density Thus the minor process is formally a critical point of the integral . The quantity measures the mean eigenvalue spacing at some location of the Gelfand-Tsetlin pyramid, and the ratio measures mean eigenvalue drift in the minor process. This suggests that this Lagrangian density is some sort of measure of entropy of the asymptotic microscale point process emerging from the minor process at this spacing and drift. There is work of Metcalfe demonstrating that this point process is given by the Boutillier bead model, so we conjecture that this Lagrangian density somehow measures the entropy density of this process.I have just uploaded to the arXiv my paper “Commutators close to the identity“, submitted to the Journal of Operator Theory. This paper resulted from some progress I made on the problem discussed in this previous post. Recall in that post the following result of Popa: if are bounded operators on a Hilbert space whose commutator is close to the identity in the sense that

for some , then one has the lower bound

In the other direction, for any , there are examples of operators obeying (1) such that

In this paper we improve the upper bound to come closer to the lower bound:

Theorem 1For any , and any infinite-dimensional , there exist operators obeying (1) such that

One can probably improve the exponent somewhat by a modification of the methods, though it does not seem likely that one can lower it all the way to without a substantially new idea. Nevertheless I believe it plausible that the lower bound (2) is close to optimal.

We now sketch the methods of proof. The construction giving (3) proceeded by first identifying with the algebra of matrices that have entries in . It is then possible to find two matrices whose commutator takes the form

for some bounded operator (for instance one can take to be an isometry). If one then conjugates by the diagonal operator , one can eusure that (1) and (3) both hold.

It is natural to adapt this strategy to matrices rather than matrices, where is a parameter at one’s disposal. If one can find matrices that are almost upper triangular (in that only the entries on or above the lower diagonal are non-zero), whose commutator only differs from the identity in the top right corner, thus

for some , then by conjugating by a diagonal matrix such as for some and optimising in , one can improve the bound in (3) to ; if the bounds in the implied constant in the are polynomial in , one can then optimise in to obtain a bound of the form (4) (perhaps with the exponent replaced by a different constant).

The task is then to find almost upper triangular matrices whose commutator takes the required form. The lower diagonals of must then commute; it took me a while to realise then that one could (usually) conjugate one of the matrices, say by a suitable diagonal matrix, so that the lower diagonal consisted entirely of the identity operator, which would make the other lower diagonal consist of a single operator, say . After a lot of further lengthy experimentation, I eventually realised that one could conjugate further by unipotent upper triangular matrices so that all remaining entries other than those on the far right column vanished. Thus, without too much loss of generality, one can assume that takes the normal form

for some , solving the system of equations

It turns out to be possible to solve this system of equations by a contraction mapping argument if one takes to be a “Hilbert’s hotel” pair of isometries as in the previous post, though the contraction is very slight, leading to polynomial losses in in the implied constant.

There is a further question raised in Popa’s paper which I was unable to resolve. As a special case of one of the main theorems (Theorem 2.1) of that paper, the following result was shown: if obeys the bounds

(where denotes the space of all operators of the form with and compact), then there exist operators with such that . (In fact, Popa’s result covers a more general situation in which one is working in a properly infinite algebra with non-trivial centre.) We sketch a proof of this result as follows. Suppose that and for some . A standard greedy algorithm argument (see this paper of Brown and Pearcy) allows one to find orthonormal vectors for such that for each , one has for some comparable to , and some orthogonal to all of the . After some conjugation (and a suitable identification of with , one can thus place in a normal form

where is a isometry with infinite deficiency, and have norm . Setting , it then suffices to solve the commutator equation

with ; note the similarity with (3).

By the usual Hilbert’s hotel construction, one can complement with another isometry obeying the “Hilbert’s hotel” identity

and also , . Proceeding as in the previous post, we can try the ansatz

for some operators , leading to the system of equations

Using the first equation to solve for , the second to then solve for , and the third to then solve for , one can obtain matrices with the required properties.

Thus far, my attempts to extend this construction to larger matrices with good bounds on have been unsuccessful. A model problem would be to express

as a commutator with significantly smaller than . The construction in my paper achieves something like this, but with replaced by a more complicated operator. One would also need variants of this result in which one is allowed to perturb the above operator by an arbitrary finite rank operator of bounded operator norm.

I am recording here some notes on a nice problem that Sorin Popa shared with me recently. To motivate the question, we begin with the basic observation that the differentiation operator and the position operator in one dimension formally obey the commutator equation

where is the identity operator and is the commutator. Among other things, this equation is fundamental in quantum mechanics, leading for instance to the Heisenberg uncertainty principle.

The operators are unbounded on spaces such as . One can ask whether the commutator equation (1) can be solved using bounded operators on a Hilbert space rather than unbounded ones. In the finite dimensional case when are just matrices for some , the answer is clearly negative, since the left-hand side of (1) has trace zero and the right-hand side does not. What about in infinite dimensions, when the trace is not available? As it turns out, the answer is still negative, as was first worked out by Wintner and Wielandt. A short proof can be given as follows. Suppose for contradiction that we can find bounded operators obeying (1). From (1) and an easy induction argument, we obtain the identity

for all natural numbers . From the triangle inequality, this implies that

Iterating this, we conclude that

for any . Bounding and then sending , we conclude that , which clearly contradicts (1). (Note the argument can be generalised without difficulty to the case when lie in a Banach algebra, rather than be bounded operators on a Hilbert space.)

It was observed by Popa that there is a quantitative version of this result:

Theorem 1Let such that

*Proof:* By multiplying by a suitable constant and dividing by the same constant, we may normalise . Write with . Then the same induction that established (2) now shows that

and hence by the triangle inequality

We divide by and sum to conclude that

giving the claim.

Again, the argument generalises easily to any Banach algebra. Popa then posed the question of whether the quantity can be replaced by any substantially larger function of , such as a polynomial in . As far as I know, the above simple bound has not been substantially improved.

In the opposite direction, one can ask for constructions of operators that are not too large in operator norm, such that is close to the identity. Again, one cannot do this in finite dimensions: has trace zero, so at least one of its eigenvalues must outside the disk , and therefore for any finite-dimensional matrices .

However, it was shown in 1965 by Brown and Pearcy that in infinite dimensions, one can construct operators with arbitrarily close to in operator norm (in fact one can prescribe any operator for as long as it is not equal to a non-zero multiple of the identity plus a compact operator). In the above paper of Popa, a quantitative version of the argument (based in part on some earlier work of Apostol and Zsido) was given as follows. The first step is to observe the following Hilbert space version of Hilbert’s hotel: in an infinite dimensional Hilbert space , one can locate isometries obeying the equation

where denotes the adjoint of . For instance, if has a countable orthonormal basis , one could set

and

where denotes the linear functional on . Observe that (4) is again impossible to satisfy in finite dimension , as the left-hand side must have trace while the right-hand side has trace .

Multiplying (4) on the left by and right by , or on the left by and right by , then gives

From (4), (5) we see in particular that, while we cannot express as a commutator of bounded operators, we can at least express it as the sum of two commutators:

We can rewrite this somewhat strangely as

and hence there exists a bounded operator such that

Moving now to the Banach algebra of matrices with entries in (which can be equivalently viewed as ), a short computation then gives the identity

for some bounded operator whose exact form will not be relevant for the argument. Now, by Neumann series (and the fact that have unit operator norm), we can find another bounded operator such that

and then another brief computation shows that

Thus we can express the operator as the commutator of two operators of norm . Conjugating by for any , we may then express as the commutator of two operators of norm . This shows that the right-hand side of (3) cannot be replaced with anything that blows up faster than as . Can one improve this bound further?

The wave equation is usually expressed in the form

where is a function of both time and space , with being the Laplacian operator. One can generalise this equation in a number of ways, for instance by replacing the spatial domain with some other manifold and replacing the Laplacian with the Laplace-Beltrami operator or adding lower order terms (such as a potential, or a coupling with a magnetic field). But for sake of discussion let us work with the classical wave equation on . We will work formally in this post, being unconcerned with issues of convergence, justifying interchange of integrals, derivatives, or limits, etc.. One then has a conserved energy

which we can rewrite using integration by parts and the inner product on as

A key feature of the wave equation is *finite speed of propagation*: if, at time (say), the initial position and initial velocity are both supported in a ball , then at any later time , the position and velocity are supported in the larger ball . This can be seen for instance (formally, at least) by inspecting the exterior energy

and observing (after some integration by parts and differentiation under the integral sign) that it is non-increasing in time, non-negative, and vanishing at time .

The wave equation is second order in time, but one can turn it into a first order system by working with the pair rather than just the single field , where is the velocity field. The system is then

and the conserved energy is now

Finite speed of propagation then tells us that if are both supported on , then are supported on for all . One also has time reversal symmetry: if is a solution, then is a solution also, thus for instance one can establish an analogue of finite speed of propagation for negative times using this symmetry.

If one has an eigenfunction

of the Laplacian, then we have the explicit solutions

of the wave equation, which formally can be used to construct all other solutions via the principle of superposition.

When one has vanishing initial velocity , the solution is given via functional calculus by

and the propagator can be expressed as the average of half-wave operators:

One can view as a minor of the full wave propagator

which is unitary with respect to the energy form (1), and is the fundamental solution to the wave equation in the sense that

Viewing the contraction as a minor of a unitary operator is an instance of the “dilation trick“.

It turns out (as I learned from Yuval Peres) that there is a useful discrete analogue of the wave equation (and of all of the above facts), in which the time variable now lives on the integers rather than on , and the spatial domain can be replaced by discrete domains also (such as graphs). Formally, the system is now of the form

where is now an integer, take values in some Hilbert space (e.g. functions on a graph ), and is some operator on that Hilbert space (which in applications will usually be a self-adjoint contraction). To connect this with the classical wave equation, let us first consider a rescaling of this system

where is a small parameter (representing the discretised time step), now takes values in the integer multiples of , and is the wave propagator operator or the heat propagator (the two operators are different, but agree to fourth order in ). One can then formally verify that the wave equation emerges from this rescaled system in the limit . (Thus, is not exactly the direct analogue of the Laplacian , but can be viewed as something like in the case of small , or if we are not rescaling to the small case. The operator is sometimes known as the *diffusion operator*)

Assuming is self-adjoint, solutions to the system (3) formally conserve the energy

This energy is positive semi-definite if is a contraction. We have the same time reversal symmetry as before: if solves the system (3), then so does . If one has an eigenfunction

to the operator , then one has an explicit solution

to (3), and (in principle at least) this generates all other solutions via the principle of superposition.

Finite speed of propagation is a lot easier in the discrete setting, though one has to offset the support of the “velocity” field by one unit. Suppose we know that has unit speed in the sense that whenever is supported in a ball , then is supported in the ball . Then an easy induction shows that if are supported in respectively, then are supported in .

The fundamental solution to the discretised wave equation (3), in the sense of (2), is given by the formula

where and are the Chebyshev polynomials of the first and second kind, thus

and

In particular, is now a minor of , and can also be viewed as an average of with its inverse :

As before, is unitary with respect to the energy form (4), so this is another instance of the dilation trick in action. The powers and are discrete analogues of the heat propagators and wave propagators respectively.

One nice application of all this formalism, which I learned from Yuval Peres, is the Varopoulos-Carne inequality:

Theorem 1 (Varopoulos-Carne inequality)Let be a (possibly infinite) regular graph, let , and let be vertices in . Then the probability that the simple random walk at lands at at time is at most , where is the graph distance.

This general inequality is quite sharp, as one can see using the standard Cayley graph on the integers . Very roughly speaking, it asserts that on a regular graph of reasonably controlled growth (e.g. polynomial growth), random walks of length concentrate on the ball of radius or so centred at the origin of the random walk.

*Proof:* Let be the graph Laplacian, thus

for any , where is the degree of the regular graph and sum is over the vertices that are adjacent to . This is a contraction of unit speed, and the probability that the random walk at lands at at time is

where are the Dirac deltas at . Using (5), we can rewrite this as

where we are now using the energy form (4). We can write

where is the simple random walk of length on the integers, that is to say where are independent uniform Bernoulli signs. Thus we wish to show that

By finite speed of propagation, the inner product here vanishes if . For we can use Cauchy-Schwarz and the unitary nature of to bound the inner product by . Thus the left-hand side may be upper bounded by

and the claim now follows from the Chernoff inequality.

This inequality has many applications, particularly with regards to relating the entropy, mixing time, and concentration of random walks with volume growth of balls; see this text of Lyons and Peres for some examples.

For sake of comparison, here is a continuous counterpart to the Varopoulos-Carne inequality:

Theorem 2 (Continuous Varopoulos-Carne inequality)Let , and let be supported on compact sets respectively. Thenwhere is the Euclidean distance between and .

*Proof:* By Fourier inversion one has

for any real , and thus

By finite speed of propagation, the inner product vanishes when ; otherwise, we can use Cauchy-Schwarz and the contractive nature of to bound this inner product by . Thus

Bounding by , we obtain the claim.

Observe that the argument is quite general and can be applied for instance to other Riemannian manifolds than .

The prime number theorem can be expressed as the assertion

is the von Mangoldt function. It is a basic result in analytic number theory, but requires a bit of effort to prove. One “elementary” proof of this theorem proceeds through the Selberg symmetry formula

where the second von Mangoldt function is defined by the formula

(We are avoiding the use of the symbol here to denote Dirichlet convolution, as we will need this symbol to denote ordinary convolution shortly.) For the convenience of the reader, we give a proof of the Selberg symmetry formula below the fold. Actually, for the purposes of proving the prime number theorem, the weaker estimate

In this post I would like to record a somewhat “soft analysis” reformulation of the elementary proof of the prime number theorem in terms of Banach algebras, and specifically in Banach algebra structures on (completions of) the space of compactly supported continuous functions equipped with the convolution operation

This soft argument does not easily give any quantitative decay rate in the prime number theorem, but by the same token it avoids many of the quantitative calculations in the traditional proofs of this theorem. Ultimately, the key “soft analysis” fact used is the spectral radius formula

for any element of a unital commutative Banach algebra , where is the space of characters (i.e., continuous unital algebra homomorphisms from to ) of . This formula is due to Gelfand and may be found in any text on Banach algebras; for sake of completeness we prove it below the fold.

The connection between prime numbers and Banach algebras is given by the following consequence of the Selberg symmetry formula.

Theorem 1 (Construction of a Banach algebra norm)For any , let denote the quantityThen is a seminorm on with the bound

for all . Furthermore, we have the Banach algebra bound

We prove this theorem below the fold. The prime number theorem then follows from Theorem 1 and the following two assertions. The first is an application of the spectral radius formula (6) and some basic Fourier analysis (in particular, the observation that contains a plentiful supply of local units):

Theorem 2 (Non-trivial Banach algebras with many local units have non-trivial spectrum)Let be a seminorm on obeying (7), (8). Suppose that is not identically zero. Then there exists such thatfor all . In particular, by (7), one has

whenever is a non-negative function.

The second is a consequence of the Selberg symmetry formula and the fact that is real (as well as Mertens’ theorem, in the case), and is closely related to the non-vanishing of the Riemann zeta function on the line :

Theorem 3 (Breaking the parity barrier)Let . Then there exists such that is non-negative, and

Assuming Theorems 1, 2, 3, we may now quickly establish the prime number theorem as follows. Theorem 2 and Theorem 3 imply that the seminorm constructed in Theorem 1 is trivial, and thus

as for any Schwartz function (the decay rate in may depend on ). Specialising to functions of the form for some smooth compactly supported on , we conclude that

as ; by the smooth Urysohn lemma this implies that

as for any fixed , and the prime number theorem then follows by a telescoping series argument.

The same argument also yields the prime number theorem in arithmetic progressions, or equivalently that

for any fixed Dirichlet character ; the one difference is that the use of Mertens’ theorem is replaced by the basic fact that the quantity is non-vanishing.

One of the basic tools in modern combinatorics is the probabilistic method, introduced by Erdos, in which a deterministic solution to a given problem is shown to exist by constructing a *random* candidate for a solution, and showing that this candidate solves all the requirements of the problem with positive probability. When the problem requires a real-valued statistic to be suitably large or suitably small, the following trivial observation is often employed:

Proposition 1 (Comparison with mean)Let be a random real-valued variable, whose mean (orfirst moment) is finite. Thenwith positive probability, and

with positive probability.

This proposition is usually applied in conjunction with a computation of the first moment , in which case this version of the probabilistic method becomes an instance of the *first moment method*. (For comparison with other moment methods, such as the second moment method, exponential moment method, and zeroth moment method, see Chapter 1 of my book with Van Vu. For a general discussion of the probabilistic method, see the book by Alon and Spencer of the same name.)

As a typical example in random matrix theory, if one wanted to understand how small or how large the operator norm of a random matrix could be, one might first try to compute the expected operator norm and then apply Proposition 1; see this previous blog post for examples of this strategy (and related strategies, based on comparing with more tractable expressions such as the moments ). (In this blog post, all matrices are complex-valued.)

Recently, in their proof of the Kadison-Singer conjecture (and also in their earlier paper on Ramanujan graphs), Marcus, Spielman, and Srivastava introduced an striking new variant of the first moment method, suited in particular for controlling the operator norm of a Hermitian positive semi-definite matrix . Such matrices have non-negative real eigenvalues, and so in this case is just the largest eigenvalue of . Traditionally, one tries to control the eigenvalues through averaged statistics such as moments or Stieltjes transforms ; again, see this previous blog post. Here we use as short-hand for , where is the identity matrix. Marcus, Spielman, and Srivastava instead rely on the interpretation of the eigenvalues of as the roots of the characteristic polynomial of , thus

where is the largest real root of a non-zero polynomial . (In our applications, we will only ever apply to polynomials that have at least one real root, but for sake of completeness let us set if has no real roots.)

Prior to the work of Marcus, Spielman, and Srivastava, I think it is safe to say that the conventional wisdom in random matrix theory was that the representation (1) of the operator norm was not particularly useful, due to the highly non-linear nature of both the characteristic polynomial map and the maximum root map . (Although, as pointed out to me by Adam Marcus, some related ideas have occurred in graph theory rather than random matrix theory, for instance in the theory of the matching polynomial of a graph.) For instance, a fact as basic as the triangle inequality is extremely difficult to establish through (1). Nevertheless, it turns out that for certain special types of random matrices (particularly those in which a typical instance of this ensemble has a simple relationship to “adjacent” matrices in this ensemble), the polynomials enjoy an extremely rich structure (in particular, they lie in families of real stable polynomials, and hence enjoy good combinatorial interlacing properties) that can be surprisingly useful. In particular, Marcus, Spielman, and Srivastava established the following nonlinear variant of Proposition 1:

Proposition 2 (Comparison with mean)Let . Let be a random matrix, which is the sum of independent Hermitian rank one matrices , each taking a finite number of values. Thenwith positive probability, and

with positive probability.

We prove this proposition below the fold. The hypothesis that each only takes finitely many values is technical and can likely be relaxed substantially, but we will not need to do so here. Despite the superficial similarity with Proposition 1, the proof of Proposition 2 is quite nonlinear; in particular, one needs the interlacing properties of real stable polynomials to proceed. Another key ingredient in the proof is the observation that while the determinant of a matrix generally behaves in a nonlinear fashion on the underlying matrix , it becomes (affine-)linear when one considers rank one perturbations, and so depends in an affine-multilinear fashion on the . More precisely, we have the following deterministic formula, also proven below the fold:

Proposition 3 (Deterministic multilinearisation formula)Let be the sum of deterministic rank one matrices . Then we havefor all , where the

mixed characteristic polynomialof any matrices (not necessarily rank one) is given by the formula

Among other things, this formula gives a useful representation of the mean characteristic polynomial :

Corollary 4 (Random multilinearisation formula)Let be the sum of jointly independent rank one matrices . Then we have

*Proof:* For fixed , the expression is a polynomial combination of the , while the differential operator is a linear combination of differential operators for . As a consequence, we may expand (3) as a linear combination of terms, each of which is a multilinear combination of for some . Taking expectations of both sides of (2) and using the joint independence of the , we obtain the claim.

In view of Proposition 2, we can now hope to control the operator norm of certain special types of random matrices (and specifically, the sum of independent Hermitian positive semi-definite rank one matrices) by first controlling the mean of the random characteristic polynomial . Pursuing this philosophy, Marcus, Spielman, and Srivastava establish the following result, which they then use to prove the Kadison-Singer conjecture:

Theorem 5 (Marcus-Spielman-Srivastava theorem)Let . Let be jointly independent random vectors in , with each taking a finite number of values. Suppose that we have the normalisationwhere we are using the convention that is the identity matrix whenever necessary. Suppose also that we have the smallness condition

for some and all . Then one has

Note that the upper bound in (5) must be at least (by taking to be deterministic) and also must be at least (by taking the to always have magnitude at least ). Thus the bound in (5) is asymptotically tight both in the regime and in the regime ; the latter regime will be particularly useful for applications to Kadison-Singer. It should also be noted that if one uses more traditional random matrix theory methods (based on tools such as Proposition 1, as well as more sophisticated variants of these tools, such as the concentration of measure results of Rudelson and Ahlswede-Winter), one obtains a bound of with high probability, which is insufficient for the application to the Kadison-Singer problem; see this article of Tropp. Thus, Theorem 5 obtains a sharper bound, at the cost of trading in “high probability” for “positive probability”.

In the paper of Marcus, Spielman and Srivastava, Theorem 5 is used to deduce a conjecture of Weaver, which was already known to imply the Kadison-Singer conjecture; actually, a slight modification of their argument gives the paving conjecture of Kadison and Singer, from which the original Kadison-Singer conjecture may be readily deduced. We give these implications below the fold. (See also this survey article for some background on the Kadison-Singer problem.)

Let us now summarise how Theorem 5 is proven. In the spirit of semi-definite programming, we rephrase the above theorem in terms of the rank one Hermitian positive semi-definite matrices :

Theorem 6 (Marcus-Spielman-Srivastava theorem again)Let be jointly independent random rank one Hermitian positive semi-definite matrices such that the sum has meanand such that

for some and all . Then one has

with positive probability.

In view of (1) and Proposition 2, this theorem follows from the following control on the mean characteristic polynomial:

Theorem 7 (Control of mean characteristic polynomial)Let be jointly independent random rank one Hermitian positive semi-definite matrices such that the sum has meanand such that

for some and all . Then one has

This result is proven using the multilinearisation formula (Corollary 4) and some convexity properties of real stable polynomials; we give the proof below the fold.

Thanks to Adam Marcus, Assaf Naor and Sorin Popa for many useful explanations on various aspects of the Kadison-Singer problem.

A finite group is said to be a Frobenius group if there is a non-trivial subgroup of (known as the *Frobenius complement* of ) such that the conjugates of are “disjoint as possible” in the sense that whenever . This gives a decomposition

where the *Frobenius kernel* of is defined as the identity element together with all the non-identity elements that are not conjugate to any element of . Taking cardinalities, we conclude that

A remarkable theorem of Frobenius gives an unexpected amount of structure on and hence on :

Theorem 1 (Frobenius’ theorem)Let be a Frobenius group with Frobenius complement and Frobenius kernel . Then is a normal subgroup of , and hence (by (2) and the disjointness of and outside the identity) is the semidirect product of and .

I discussed Frobenius’ theorem and its proof in this recent blog post. This proof uses the theory of characters on a finite group , in particular relying on the fact that a character on a subgroup can induce a character on , which can then be decomposed into irreducible characters with *natural number* coefficients. Remarkably, even though a century has passed since Frobenius’ original argument, there is no proof known of this theorem which avoids character theory entirely; there are elementary proofs known when the complement has even order or when is solvable (we review both of these cases below the fold), which by the Feit-Thompson theorem does cover all the cases, but the proof of the Feit-Thompson theorem involves plenty of character theory (and also relies on Theorem 1). (The answers to this MathOverflow question give a good overview of the current state of affairs.)

I have been playing around recently with the problem of finding a character-free proof of Frobenius’ theorem. I didn’t succeed in obtaining a completely elementary proof, but I did find an argument which replaces character theory (which can be viewed as coming from the representation theory of the non-commutative group algebra ) with the Fourier analysis of class functions (i.e. the representation theory of the centre of the group algebra), thus replacing non-commutative representation theory by commutative representation theory. This is not a particularly radical depature from the existing proofs of Frobenius’ theorem, but it did seem to be a new proof which was technically “character-free” (even if it was not all that far from character-based in spirit), so I thought I would record it here.

The main ideas are as follows. The space of class functions can be viewed as a commutative algebra with respect to the convolution operation ; as the regular representation is unitary and faithful, this algebra contains no nilpotent elements. As such, (Gelfand-style) Fourier analysis suggests that one can analyse this algebra through the idempotents: class functions such that . In terms of characters, idempotents are nothing more than sums of the form for various collections of characters, but we can perform a fair amount of analysis on idempotents directly without recourse to characters. In particular, it turns out that idempotents enjoy some important integrality properties that can be established without invoking characters: for instance, by taking traces one can check that is a natural number, and more generally we will show that is a natural number whenever is a subgroup of (see Corollary 4 below). For instance, the quantity

is a natural number which we will call the *rank* of (as it is also the linear rank of the transformation on ).

In the case that is a Frobenius group with kernel , the above integrality properties can be used after some elementary manipulations to establish that for any idempotent , the quantity

is an integer. On the other hand, one can also show by elementary means that this quantity lies between and . These two facts are not strong enough on their own to impose much further structure on , unless one restricts attention to *minimal* idempotents . In this case spectral theory (or Gelfand theory, or the fundamental theorem of algebra) tells us that has rank one, and then the *integrality gap* comes into play and forces the quantity (3) to always be either zero or one. This can be used to imply that the convolution action of every minimal idempotent either preserves or annihilates it, which makes itself an idempotent, which makes normal.

## Recent Comments