You are currently browsing the tag archive for the ‘free convolution’ tag.

Dimitri Shlyakhtenko and I have uploaded to the arXiv our paper Fractional free convolution powers. For me, this project (which we started during the 2018 IPAM program on quantitative linear algebra) was motivated by a desire to understand the behavior of the minor process applied to a large random Hermitian {N \times N} matrix {A_N}, in which one takes the successive upper left {n \times n} minors {A_n} of {A_N} and computes their eigenvalues {\lambda_1(A_n) \leq \dots \leq \lambda_n(A_n)} in non-decreasing order. These eigenvalues are related to each other by the Cauchy interlacing inequalities

\displaystyle  \lambda_i(A_{n+1}) \leq \lambda_i(A_n) \leq \lambda_{i+1}(A_{n+1})

for {1 \leq i \leq n < N}, and are often arranged in a triangular array known as a Gelfand-Tsetlin pattern, as discussed in these previous blog posts.

When {N} is large and the matrix {A_N} is a random matrix with empirical spectral distribution converging to some compactly supported probability measure {\mu} on the real line, then under suitable hypotheses (e.g., unitary conjugation invariance of the random matrix ensemble {A_N}), a “concentration of measure” effect occurs, with the spectral distribution of the minors {A_n} for {n = \lfloor N/k\rfloor} for any fixed {k \geq 1} converging to a specific measure {k^{-1}_* \mu^{\boxplus k}} that depends only on {\mu} and {k}. The reason for this notation is that there is a surprising description of this measure {k^{-1}_* \mu^{\boxplus k}} when {k} is a natural number, namely it is the free convolution {\mu^{\boxplus k}} of {k} copies of {\mu}, pushed forward by the dilation map {x \mapsto k^{-1} x}. For instance, if {\mu} is the Wigner semicircular measure {d\mu_{sc} = \frac{1}{\pi} (4-x^2)^{1/2}_+\ dx}, then {k^{-1}_* \mu_{sc}^{\boxplus k} = k^{-1/2}_* \mu_{sc}}. At the random matrix level, this reflects the fact that the minor of a GUE matrix is again a GUE matrix (up to a renormalizing constant).

As first observed by Bercovici and Voiculescu and developed further by Nica and Speicher, among other authors, the notion of a free convolution power {\mu^{\boxplus k}} of {\mu} can be extended to non-integer {k \geq 1}, thus giving the notion of a “fractional free convolution power”. This notion can be defined in several different ways. One of them proceeds via the Cauchy transform

\displaystyle  G_\mu(z) := \int_{\bf R} \frac{d\mu(x)}{z-x}

of the measure {\mu}, and {\mu^{\boxplus k}} can be defined by solving the Burgers-type equation

\displaystyle  (k \partial_k + z \partial_z) G_{\mu^{\boxplus k}}(z) = \frac{\partial_z G_{\mu^{\boxplus k}}(z)}{G_{\mu^{\boxplus k}}(z)} \ \ \ \ \ (1)

with initial condition {G_{\mu^{\boxplus 1}} = G_\mu} (see this previous blog post for a derivation). This equation can be solved explicitly using the {R}-transform {R_\mu} of {\mu}, defined by solving the equation

\displaystyle  \frac{1}{G_\mu(z)} + R_\mu(G_\mu(z)) = z

for sufficiently large {z}, in which case one can show that

\displaystyle  R_{\mu^{\boxplus k}}(z) = k R_\mu(z).

(In the case of the semicircular measure {\mu_{sc}}, the {R}-transform is simply the identity: {R_{\mu_{sc}}(z)=z}.)

Nica and Speicher also gave a free probability interpretation of the fractional free convolution power: if {A} is a noncommutative random variable in a noncommutative probability space {({\mathcal A},\tau)} with distribution {\mu}, and {p} is a real projection operator free of {A} with trace {1/k}, then the “minor” {[pAp]} of {A} (viewed as an element of a new noncommutative probability space {({\mathcal A}_p, \tau_p)} whose elements are minors {[pXp]}, {X \in {\mathcal A}} with trace {\tau_p([pXp]) := k \tau(pXp)}) has the law of {k^{-1}_* \mu^{\boxplus k}} (we give a self-contained proof of this in an appendix to our paper). This suggests that the minor process (or fractional free convolution) can be studied within the framework of free probability theory.

One of the known facts about integer free convolution powers {\mu^{\boxplus k}} is monotonicity of the free entropy

\displaystyle  \chi(\mu) = \int_{\bf R} \int_{\bf R} \log|s-t|\ d\mu(s) d\mu(t) + \frac{3}{4} + \frac{1}{2} \log 2\pi

and free Fisher information

\displaystyle  \Phi(\mu) = \frac{2\pi^2}{3} \int_{\bf R} \left(\frac{d\mu}{dx}\right)^3\ dx

which were introduced by Voiculescu as free probability analogues of the classical probability concepts of differential entropy and classical Fisher information. (Here we correct a small typo in the normalization constant of Fisher entropy as presented in Voiculescu’s paper.) Namely, it was shown by Shylakhtenko that the quantity {\chi(k^{-1/2}_* \mu^{\boxplus k})} is monotone non-decreasing for integer {k}, and the Fisher information {\Phi(k^{-1/2}_* \mu^{\boxplus k})} is monotone non-increasing for integer {k}. This is the free probability analogue of the corresponding monotonicities for differential entropy and classical Fisher information that was established by Artstein, Ball, Barthe, and Naor, answering a question of Shannon.

Our first main result is to extend the monotonicity results of Shylakhtenko to fractional {k \geq 1}. We give two proofs of this fact, one using free probability machinery, and a more self contained (but less motivated) proof using integration by parts and contour integration. The free probability proof relies on the concept of the free score {J(X)} of a noncommutative random variable, which is the analogue of the classical score. The free score, also introduced by Voiculescu, can be defined by duality as measuring the perturbation with respect to semicircular noise, or more precisely

\displaystyle  \frac{d}{d\varepsilon} \tau( Z P( X + \varepsilon Z) )|_{\varepsilon=0} = \tau( J(X) P(X) )

whenever {P} is a polynomial and {Z} is a semicircular element free of {X}. If {X} has an absolutely continuous law {\mu = f\ dx} for a sufficiently regular {f}, one can calculate {J(X)} explicitly as {J(X) = 2\pi Hf(X)}, where {Hf} is the Hilbert transform of {f}, and the Fisher information is given by the formula

\displaystyle  \Phi(X) = \tau( J(X)^2 ).

One can also define a notion of relative free score {J(X:B)} relative to some subalgebra {B} of noncommutative random variables.

The free score interacts very well with the free minor process {X \mapsto [pXp]}, in particular by standard calculations one can establish the identity

\displaystyle  J( [pXp] : [pBp] ) = k {\bf E}( [p J(X:B) p] | [pXp], [pBp] )

whenever {X} is a noncommutative random variable, {B} is an algebra of noncommutative random variables, and {p} is a real projection of trace {1/k} that is free of both {X} and {B}. The monotonicity of free Fisher information then follows from an application of Pythagoras’s theorem (which implies in particular that conditional expectation operators are contractions on {L^2}). The monotonicity of free entropy then follows from an integral representation of free entropy as an integral of free Fisher information along the free Ornstein-Uhlenbeck process (or equivalently, free Fisher information is essentially the rate of change of free entropy with respect to perturbation by semicircular noise). The argument also shows when equality holds in the monotonicity inequalities; this occurs precisely when {\mu} is a semicircular measure up to affine rescaling.

After an extensive amount of calculation of all the quantities that were implicit in the above free probability argument (in particular computing the various terms involved in the application of Pythagoras’ theorem), we were able to extract a self-contained proof of monotonicity that relied on differentiating the quantities in {k} and using the differential equation (1). It turns out that if {d\mu = f\ dx} for sufficiently regular {f}, then there is an identity

\displaystyle  \partial_k \Phi( k^{-1/2}_* \mu^{\boxplus k} ) = -\frac{1}{2\pi^2} \lim_{\varepsilon \rightarrow 0} \sum_{\alpha,\beta = \pm} f(x) f(y) K(x+i\alpha \varepsilon, y+i\beta \varepsilon)\ dx dy \ \ \ \ \ (2)

where {K} is the kernel

\displaystyle  K(z,w) := \frac{1}{G(z) G(w)} (\frac{G(z)-G(w)}{z-w} + G(z) G(w))^2

and {G(z) := G_\mu(z)}. It is not difficult to show that {K(z,\overline{w})} is a positive semi-definite kernel, which gives the required monotonicity. It would be interesting to obtain some more insightful interpretation of the kernel {K} and the identity (2).

These monotonicity properties hint at the minor process {A \mapsto [pAp]} being associated to some sort of “gradient flow” in the {k} parameter. We were not able to formalize this intuition; indeed, it is not clear what a gradient flow on a varying noncommutative probability space {({\mathcal A}_p, \tau_p)} even means. However, after substantial further calculation we were able to formally describe the minor process as the Euler-Lagrange equation for an intriguing Lagrangian functional that we conjecture to have a random matrix interpretation. We first work in “Lagrangian coordinates”, defining the quantity {\lambda(s,y)} on the “Gelfand-Tsetlin pyramid”

\displaystyle  \Delta = \{ (s,y): 0 < s < 1; 0 < y < s \}

by the formula

\displaystyle  \mu^{\boxplus 1/s}((-\infty,\lambda(s,y)/s])=y/s,

which is well defined if the density of {\mu} is sufficiently well behaved. The random matrix interpretation of {\lambda(s,y)} is that it is the asymptotic location of the {\lfloor yN\rfloor^{th}} eigenvalue of the {\lfloor sN \rfloor \times \lfloor sN \rfloor} upper left minor of a random {N \times N} matrix {A_N} with asymptotic empirical spectral distribution {\mu} and with unitarily invariant distribution, thus {\lambda} is in some sense a continuum limit of Gelfand-Tsetlin patterns. Thus for instance the Cauchy interlacing laws in this asymptotic limit regime become

\displaystyle  0 \leq \partial_s \lambda \leq \partial_y \lambda.

After a lengthy calculation (involving extensive use of the chain rule and product rule), the equation (1) is equivalent to the Euler-Lagrange equation

\displaystyle  \partial_s L_{\lambda_s}(\partial_s \lambda, \partial_y \lambda) + \partial_y L_{\lambda_y}(\partial_s \lambda, \partial_y \lambda) = 0

where {L} is the Lagrangian density

\displaystyle  L(\lambda_s, \lambda_y) := \log \lambda_y + \log \sin( \pi \frac{\lambda_s}{\lambda_y} ).

Thus the minor process is formally a critical point of the integral {\int_\Delta L(\partial_s \lambda, \partial_y \lambda)\ ds dy}. The quantity {\partial_y \lambda} measures the mean eigenvalue spacing at some location of the Gelfand-Tsetlin pyramid, and the ratio {\frac{\partial_s \lambda}{\partial_y \lambda}} measures mean eigenvalue drift in the minor process. This suggests that this Lagrangian density is some sort of measure of entropy of the asymptotic microscale point process emerging from the minor process at this spacing and drift. There is work of Metcalfe demonstrating that this point process is given by the Boutillier bead model, so we conjecture that this Lagrangian density {L} somehow measures the entropy density of this process.

In the foundations of modern probability, as laid out by Kolmogorov, the basic objects of study are constructed in the following order:

  1. Firstly, one selects a sample space {\Omega}, whose elements {\omega} represent all the possible states that one’s stochastic system could be in.
  2. Then, one selects a {\sigma}-algebra {{\mathcal B}} of events {E} (modeled by subsets of {\Omega}), and assigns each of these events a probability {{\bf P}(E) \in [0,1]} in a countably additive manner, so that the entire sample space has probability {1}.
  3. Finally, one builds (commutative) algebras of random variables {X} (such as complex-valued random variables, modeled by measurable functions from {\Omega} to {{\bf C}}), and (assuming suitable integrability or moment conditions) one can assign expectations {\mathop{\bf E} X} to each such random variable.

In measure theory, the underlying measure space {\Omega} plays a prominent foundational role, with the measurable sets and measurable functions (the analogues of the events and the random variables) always being viewed as somehow being attached to that space. In probability theory, in contrast, it is the events and their probabilities that are viewed as being fundamental, with the sample space {\Omega} being abstracted away as much as possible, and with the random variables and expectations being viewed as derived concepts. See Notes 0 for further discussion of this philosophy.

However, it is possible to take the abstraction process one step further, and view the algebra of random variables and their expectations as being the foundational concept, and ignoring both the presence of the original sample space, the algebra of events, or the probability measure.

There are two reasons for wanting to shed (or abstract away) these previously foundational structures. Firstly, it allows one to more easily take certain types of limits, such as the large {n} limit {n \rightarrow \infty} when considering {n \times n} random matrices, because quantities built from the algebra of random variables and their expectations, such as the normalised moments of random matrices tend to be quite stable in the large {n} limit (as we have seen in previous notes), even as the sample space and event space varies with {n}. (This theme of using abstraction to facilitate the taking of the large {n} limit also shows up in the application of ergodic theory to combinatorics via the correspondence principle; see this previous blog post for further discussion.)

Secondly, this abstract formalism allows one to generalise the classical, commutative theory of probability to the more general theory of non-commutative probability theory, which does not have a classical underlying sample space or event space, but is instead built upon a (possibly) non-commutative algebra of random variables (or “observables”) and their expectations (or “traces”). This more general formalism not only encompasses classical probability, but also spectral theory (with matrices or operators taking the role of random variables, and the trace taking the role of expectation), random matrix theory (which can be viewed as a natural blend of classical probability and spectral theory), and quantum mechanics (with physical observables taking the role of random variables, and their expected value on a given quantum state being the expectation). It is also part of a more general “non-commutative way of thinking” (of which non-commutative geometry is the most prominent example), in which a space is understood primarily in terms of the ring or algebra of functions (or function-like objects, such as sections of bundles) placed on top of that space, and then the space itself is largely abstracted away in order to allow the algebraic structures to become less commutative. In short, the idea is to make algebra the foundation of the theory, as opposed to other possible choices of foundations such as sets, measures, categories, etc..

[Note that this foundational preference is to some extent a metamathematical one rather than a mathematical one; in many cases it is possible to rewrite the theory in a mathematically equivalent form so that some other mathematical structure becomes designated as the foundational one, much as probability theory can be equivalently formulated as the measure theory of probability measures. However, this does not negate the fact that a different choice of foundations can lead to a different way of thinking about the subject, and thus to ask a different set of questions and to discover a different set of proofs and solutions. Thus it is often of value to understand multiple foundational perspectives at once, to get a truly stereoscopic view of the subject.]

It turns out that non-commutative probability can be modeled using operator algebras such as {C^*}-algebras, von Neumann algebras, or algebras of bounded operators on a Hilbert space, with the latter being accomplished via the Gelfand-Naimark-Segal construction. We will discuss some of these models here, but just as probability theory seeks to abstract away its measure-theoretic models, the philosophy of non-commutative probability is also to downplay these operator algebraic models once some foundational issues are settled.

When one generalises the set of structures in one’s theory, for instance from the commutative setting to the non-commutative setting, the notion of what it means for a structure to be “universal”, “free”, or “independent” can change. The most familiar example of this comes from group theory. If one restricts attention to the category of abelian groups, then the “freest” object one can generate from two generators {e,f} is the free abelian group of commutative words {e^n f^m} with {n,m \in {\bf Z}}, which is isomorphic to the group {{\bf Z}^2}. If however one generalises to the non-commutative setting of arbitrary groups, then the “freest” object that can now be generated from two generators {e,f} is the free group {{\Bbb F}_2} of non-commutative words {e^{n_1} f^{m_1} \ldots e^{n_k} f^{m_k}} with {n_1,m_1,\ldots,n_k,m_k \in {\bf Z}}, which is a significantly larger extension of the free abelian group {{\bf Z}^2}.

Similarly, when generalising classical probability theory to non-commutative probability theory, the notion of what it means for two or more random variables to be independent changes. In the classical (commutative) setting, two (bounded, real-valued) random variables {X, Y} are independent if one has

\displaystyle \mathop{\bf E} f(X) g(Y) = 0

whenever {f, g: {\bf R} \rightarrow {\bf R}} are well-behaved functions (such as polynomials) such that {\mathop{\bf E} f(X)}, {\mathop{\bf E} g(Y)} both vanish. In the non-commutative setting, one can generalise the above definition to two commuting bounded self-adjoint variables; this concept is useful for instance in quantum probability, which is an abstraction of the theory of observables in quantum mechanics. But for two (bounded, self-adjoint) non-commutative random variables {X, Y}, the notion of classical independence no longer applies. As a substitute, one can instead consider the notion of being freely independent (or free for short), which means that

\displaystyle \mathop{\bf E} f_1(X) g_1(Y) \ldots f_k(X) g_k(Y) = 0

whenever {f_1,g_1,\ldots,f_k,g_k: {\bf R} \rightarrow {\bf R}} are well-behaved functions such that all of {\mathop{\bf E} f_1(X), \mathop{\bf E} g_1(Y), \ldots, \mathop{\bf E} f_k(X), \mathop{\bf E} g_k(Y)} vanish.

The concept of free independence was introduced by Voiculescu, and its study is now known as the subject of free probability. We will not attempt a systematic survey of this subject here; for this, we refer the reader to the surveys of Speicher and of Biane. Instead, we shall just discuss a small number of topics in this area to give the flavour of the subject only.

The significance of free probability to random matrix theory lies in the fundamental observation that random matrices which are independent in the classical sense, also tend to be independent in the free probability sense, in the large {n} limit {n \rightarrow \infty}. (This is only possible because of the highly non-commutative nature of these matrices; as we shall see, it is not possible for non-trivial commuting independent random variables to be freely independent.) Because of this, many tedious computations in random matrix theory, particularly those of an algebraic or enumerative combinatorial nature, can be done more quickly and systematically by using the framework of free probability, which by design is optimised for algebraic tasks rather than analytical ones.

Much as free groups are in some sense “maximally non-commutative”, freely independent random variables are about as far from being commuting as possible. For instance, if {X, Y} are freely independent and of expectation zero, then {\mathop{\bf E} XYXY} vanishes, but {\mathop{\bf E} XXYY} instead factors as {(\mathop{\bf E} X^2) (\mathop{\bf E} Y^2)}. As a consequence, the behaviour of freely independent random variables can be quite different from the behaviour of their classically independent commuting counterparts. Nevertheless there is a remarkably strong analogy between the two types of independence, in that results which are true in the classically independent case often have an interesting analogue in the freely independent setting. For instance, the central limit theorem (Notes 2) for averages of classically independent random variables, which roughly speaking asserts that such averages become gaussian in the large {n} limit, has an analogue for averages of freely independent variables, the free central limit theorem, which roughly speaking asserts that such averages become semicircular in the large {n} limit. One can then use this theorem to provide yet another proof of Wigner’s semicircle law (Notes 4).

Another important (and closely related) analogy is that while the distribution of sums of independent commutative random variables can be quickly computed via the characteristic function (i.e. the Fourier transform of the distribution), the distribution of sums of freely independent non-commutative random variables can be quickly computed using the Stieltjes transform instead (or with closely related objects, such as the {R}-transform of Voiculescu). This is strongly reminiscent of the appearance of the Stieltjes transform in random matrix theory, and indeed we will see many parallels between the use of the Stieltjes transform here and in Notes 4.

As mentioned earlier, free probability is an excellent tool for computing various expressions of interest in random matrix theory, such as asymptotic values of normalised moments in the large {n} limit {n \rightarrow \infty}. Nevertheless, as it only covers the asymptotic regime in which {n} is sent to infinity while holding all other parameters fixed, there are some aspects of random matrix theory to which the tools of free probability are not sufficient by themselves to resolve (although it can be possible to combine free probability theory with other tools to then answer these questions). For instance, questions regarding the rate of convergence of normalised moments as {n \rightarrow \infty} are not directly answered by free probability, though if free probability is combined with tools such as concentration of measure (Notes 1) then such rate information can often be recovered. For similar reasons, free probability lets one understand the behaviour of {k^{th}} moments as {n \rightarrow \infty} for fixed {k}, but has more difficulty dealing with the situation in which {k} is allowed to grow slowly in {n} (e.g. {k = O(\log n)}). Because of this, free probability methods are effective at controlling the bulk of the spectrum of a random matrix, but have more difficulty with the edges of that spectrum (as well as with related concepts such as the operator norm, Notes 3) as well as with fine-scale structure of the spectrum. Finally, free probability methods are most effective when dealing with matrices that are Hermitian with bounded operator norm, largely because the spectral theory of bounded self-adjoint operators in the infinite-dimensional setting of the large {n} limit is non-pathological. (This is ultimately due to the stable nature of eigenvalues in the self-adjoint setting; see this previous blog post for discussion.) For non-self-adjoint operators, free probability needs to be augmented with additional tools, most notably by bounds on least singular values, in order to recover the required stability for the various spectral data of random matrices to behave continuously with respect to the large {n} limit. We will discuss this latter point in a later set of notes.

Read the rest of this entry »