Given a set {S}, a (simple) point process is a random subset {A} of {S}. (A non-simple point process would allow multiplicity; more formally, {A} is no longer a subset of {S}, but is a Radon measure on {S}, where we give {S} the structure of a locally compact Polish space, but I do not wish to dwell on these sorts of technical issues here.) Typically, {A} will be finite or countable, even when {S} is uncountable. Basic examples of point processes include

  • (Bernoulli point process) {S} is an at most countable set, {0 \leq p \leq 1} is a parameter, and {A} a random set such that the events {x \in A} for each {x \in S} are jointly independent and occur with a probability of {p} each. This process is automatically simple.
  • (Discrete Poisson point process) {S} is an at most countable space, {\lambda} is a measure on {S} (i.e. an assignment of a non-negative number {\lambda(\{x\})} to each {x \in S}), and {A} is a multiset where the multiplicity of {x} in {A} is a Poisson random variable with intensity {\lambda(\{x\})}, and the multiplicities of {x \in A} as {x} varies in {S} are jointly independent. This process is usually not simple.
  • (Continuous Poisson point process) {S} is a locally compact Polish space with a Radon measure {\lambda}, and for each {\Omega \subset S} of finite measure, the number of points {|A \cap \Omega|} that {A} contains inside {\Omega} is a Poisson random variable with intensity {\lambda(\Omega)}. Furthermore, if {\Omega_1,\ldots,\Omega_n} are disjoint sets, then the random variables {|A \cap \Omega_1|, \ldots, |A \cap \Omega_n|} are jointly independent. (The fact that Poisson processes exist at all requires a non-trivial amount of measure theory, and will not be discussed here.) This process is almost surely simple iff all points in {S} have measure zero.
  • (Spectral point processes) The spectrum of a random matrix is a point process in {{\mathbb C}} (or in {{\mathbb R}}, if the random matrix is Hermitian). If the spectrum is almost surely simple, then the point process is almost surely simple. In a similar spirit, the zeroes of a random polynomial are also a point process.

A remarkable fact is that many natural (simple) point processes are determinantal processes. Very roughly speaking, this means that there exists a positive semi-definite kernel {K: S \times S \rightarrow {\mathbb R}} such that, for any {x_1,\ldots,x_n \in S}, the probability that {x_1,\ldots,x_n} all lie in the random set {A} is proportional to the determinant {\det( (K(x_i,x_j))_{1 \leq i,j \leq n} )}. Examples of processes known to be determinantal include non-intersecting random walks, spectra of random matrix ensembles such as GUE, and zeroes of polynomials with gaussian coefficients.

I would be interested in finding a good explanation (even at the heuristic level) as to why determinantal processes are so prevalent in practice. I do have a very weak explanation, namely that determinantal processes obey a large number of rather pretty algebraic identities, and so it is plausible that any other process which has a very algebraic structure (in particular, any process involving gaussians, characteristic polynomials, etc.) would be connected in some way with determinantal processes. I’m not particularly satisfied with this explanation, but I thought I would at least describe some of these identities below to support this case. (This is partly for my own benefit, as I am trying to learn about these processes, particularly in connection with the spectral distribution of random matrices.) The material here is partly based on this survey of Hough, Krishnapur, Peres, and Virág.

— 1. Discrete determinantal processes —

In order to ignore all measure-theoretic distractions and focus on the algebraic structure of determinantal processes, we will first consider the discrete case when the space {S} is just a finite set {S = \{1,\ldots,N\}} of cardinality {|S|=N}. We say that a process {A \subset S} is a determinantal process with kernel {K}, where {K} is an {k \times k} symmetric real matrix, if one has

\displaystyle  {\Bbb P}(\{i_1,\ldots,i_k\} \subset A) = \det( K( i_a, i_b ) )_{1 \leq a,b \leq k} \ \ \ \ \ (1)

for all distinct {i_1,\ldots,i_k \in S}.

To build determinantal processes, let us first consider point processes of a fixed cardinality {n}, thus {0 \leq n \leq N} and {A} is a random subset of {S} of size {n}, or in other words a random variable taking values in the set {\binom{S}{n} := \{ B \subset S: |B| = n \}}.

In this simple model, an {n}-element point processes is basically just a collection of {\binom{N}{n}} probabilities {p_B = {\Bbb P}( A = B )}, one for each {B \in \binom{S}{n}}, which are non-negative numbers which add up to {1}. For instance, in the uniform point process where {A} is drawn uniformly at random from {\binom{S}{n}}, each of these probabilities {p_B} would equal {1/\binom{N}{n}}. How would one generate other interesting examples of {n}-element point processes?

For this, we can borrow the idea from quantum mechanics that probabilities can arise as the square of coefficients of unit vectors, though unlike quantum mechanics it will be slightly more convenient here to work with real vectors rather than complex ones. To formalise this, we work with the {n^{th}} exterior power {\bigwedge^n {\mathbb R}^N} of the Euclidean space {{\mathbb R}^N}; this space is sort of a “quantisation” of {\binom{S}{n}}, and is analogous to the space of quantum states of {n} identical fermions, if each fermion can exist classically in one of {N} states (or “spins”). (The requirement that the process be simple is then analogous to the Pauli exclusion principle.)

This space of {n}-vectors in {{\mathbb R}^N} is spanned by the wedge products {e_{i_1} \wedge \ldots \wedge e_{i_n}} with {1 \leq i_1 < \ldots < i_n \leq N}, where {e_1,\ldots,e_N} is the standard basis of {{\mathbb R}^N}. There is a natural inner product to place on {\bigwedge^n {\mathbb R}^N} by declaring all the {e_{i_1} \wedge \ldots \wedge e_{i_n}} to be orthonormal.

Lemma 1 If {f_1,\ldots,f_N} is any orthonormal basis of {{\mathbb R}^N}, then the {f_{i_1} \wedge \ldots \wedge f_{i_n}} for {1 \leq i_1 < \ldots < i_n \leq N} are an orthonormal basis for {\bigwedge^n {\mathbb R}^N}.

Proof: By definition, this is true when {(f_1,\ldots,f_N)=(e_1,\ldots,e_N)}. If the claim is true for some orthonormal basis {f_1,\ldots,f_N}, it is not hard to see that the claim also holds if one rotates {f_i} and {f_j} in the plane that they span by some angle {\theta}, where {1 \leq i < j \leq n} are arbitrary. But any orthonormal basis can be rotated into any other by a sequence of such rotations (e.g. by using Euler angles), and the claim follows. \Box

Corollary 2 If {v_1,\ldots,v_n} are vectors in {{\mathbb R}^N}, then the magnitude of {v_1 \wedge \ldots \wedge v_n} is equal to the {n}-dimensional volume of the parallelopiped spanned by {v_1,\ldots,v_n}.

Proof: Observe that applying row operations to {v_i} (i.e. modifying one {v_i} by a scalar multiple of another {v_j}) does not affect either the wedge product or the volume of the parallelopiped. Thus by using the Gram-Schmidt process, we may assume that the {v_i} are orthogonal; by normalising we may assume they are orthonormal. The claim now follows from the preceding lemma. \Box

From this and the ordinary Pythagorean theorem in the inner product space {\bigwedge^n {\mathbb R}^N}, we conclude the multidimensional Pythagorean theorem: the square of the {n}-dimensional volume of a parallelopiped in {{\mathbb R}^N} is the sum of squares of the {n}-dimensional volumes of the projection of that parallelopiped to each of the {\binom{N}{n}} coordinate subspaces {\hbox{span}(e_{i_1},\ldots,e_{i_n})}. (I believe this theorem was first observed in this generality by Donchian and Coxeter.) We also note another related fact:

Lemma 3 (Gram identity) If {v_1,\ldots,v_n} are vectors in {{\mathbb R}^N}, then the square of the magnitude of {v_1 \wedge \ldots \wedge v_n} is equal to the determinant of the Gram matrix {( v_i \cdot v_j )_{1 \leq i,j \leq n}}.

Proof: Again, the statement is invariant under row operations, and one can reduce as before to the case of an orthonormal set, in which case the claim is clear. (Alternatively, one can proceed via the Cauchy-Binet formula.) \Box

If we define {e_{\{i_1,\ldots,i_n\}} := e_{i_1} \wedge \ldots \wedge e_{i_n}}, then we have identified the standard basis of {\bigwedge^n {\mathbb R}^N} with {\binom{S}{n}} by identifying {e_B} with {B}. As a consequence of this and the multidimensional Pythagorean theorem, every unit {n}-vector {\omega} in {\bigwedge^n {\mathbb R}^N} determines an {n}-element point process {A} on {S}, by declaring the probability {p_B} of {A} taking the value {B} to equal {|\omega \cdot e_B|^2} for each {B \in \binom{S}{n}}. Note that multiple {n}-vectors can generate the same point process, because only the magnitude of the coefficients {\omega \cdot e_B} are of interest; in particular, {\omega} and {-\omega} generate the same point process. (This is analogous to how multiplying the wave function in quantum mechanics by a complex phase has no effect on any physical observable.)

Now we can introduce determinantal processes. If {V} is an {n}-dimensional subspace of {{\mathbb R}^N}, we can define the (projection) determinantal process {A = A_V} associated to {V} to be the point process associated to the volume form of {V}, i.e. to the wedge product of an orthonormal basis of {V}. (This volume form is only determined up to sign, because the orientation of {V} has not been fixed, but as observed previously, the sign of the form has no impact on the resulting point process.)

By construction, the probability that the point process {A} is equal to a set {\{i_1,\ldots,i_n\}} is equal to the square of the determinant of the {n \times n} matrix consisting of the {i_1,\ldots,i_n} coordinates of an arbitrary orthonormal basis of {V}. By extending such an orthonormal basis to the rest of {{\mathbb R}^N}, and representing {e_{i_1},\ldots,e_{i_n}} in this basis, it is not hard to see that {{\Bbb P}( A = \{i_1,\ldots,i_n\})} can be interpreted geometrically as the square of the volume of the parallelopiped generated by {Pe_{i_1},\ldots,Pe_{i_n}}, where {P} is the orthogonal projection onto {V}.

In fact we have the more general fact:

Lemma 4 If {k \geq 1} and {i_1,\ldots,i_k} are distinct elements of {S}, then {{\Bbb P}( \{i_1,\ldots,i_k\} \subset A )} is equal to the square of the {k}-dimensional volume of the parallelopiped generated by the orthogonal projections of {Pe_{i_1},\ldots,Pe_{i_k}} to {V}.

Proof: We can assume that {k \leq n}, since both expressions in the lemma vanish otherwise.

By (anti-)symmetry we may assume that {\{i_1,\ldots,i_k\}=\{1,\ldots,k\}}. By the Gram-Schmidt process we can find an orthonormal basis {v_1,\ldots,v_n} of {V} such that each {v_i} is orthogonal to {e_1,\ldots,e_{i-1}}.

Now consider the {n \times N} matrix {M} with rows {v_1,\ldots,v_n}, thus {M} vanishes below the diagonal. The probability {{\Bbb P}( \{1,\ldots,k\} \in A )} is equal to the sum of squares of the determinants of all the {n \times n} minors of {M} that contain the first {k} rows. As {M} vanishes below the diagonal, we see from cofactor expansion that this is equal to the product of the squares of the first {k} diagonal entries, times the sum of squares of the determinants of all the {n-k \times n-k} minors of the bottom {n-k} rows. But by the generalised Pythagorean theorem, this latter factor is the square of the volume of the parallelopiped generated by {v_{k+1},\ldots,v_n}, which is {1}. Meanwhile, by the base times height formula, we see that the product of the first {k} diagonal entries of {M} is equal in magnitude to the {k}-dimensional volume of the orthogonal projections of {e_1,\ldots,e_k} to {V}. The claim follows. \Box

In particular, we have {{\Bbb P}(i \in A) = \| P e_i \|^2} for any {i}. In particular, if {e_i} lies in {V}, then {i} almost surely lies in {A}, and when {e_i} is orthogonal to {V}, {i} almost surely is disjoint from {A}.

Let {K(i,j) = Pe_i \cdot e_j = Pe_i \cdot Pe_j} denote the matrix coefficients of the orthogonal projection {P}. From Lemma 4 and the Gram identity, we conclude that {A} is a determinantal process (see (1)) with kernel {K}. Also, by combining Lemma 4 with the generalised Pythagorean theorem, we conclude a monotonicity property:

Lemma 5 (Monotonicity property) If {V \subset W} are nested subspaces of {{\mathbb R}^N}, then {\mathop{\mathbb P}( B \subset A_V ) \leq \mathop{\mathbb P}( B \subset A_W)} for every {B \subset S}.

This seems to suggest that there is some way of representing {A_W} as the union of {A_V} with another process coupled with {A_V}, but I was not able to build a non-artificial example of such a representation. On the other hand, if {V \subset {\mathbb R}^{N}} and {V' \subset {\mathbb R}^{N'}}, then the process {A_{V \oplus V'}} associated with the direct sum {V \oplus V' \subset {\mathbb R}^{N+N'}} has the same distribution of the disjoint union of {A_V} with an independent copy of {A_{V'}}.

The determinantal process interacts nicely with complements:

Lemma 6 (Hodge duality) Let {V} be an {n}-dimensional subspace of {{\mathbb R}^N}. The {N-n}-element determinantal process {A_{V^\perp}} associated to the orthogonal complement {V^\perp} of {V} has the same distribution as the complement {S \backslash A_V} of the {n}-element determinantal process {A_V} associated to {V}.

Proof: We need to show that {\mathop{\mathbb P}( A_V = B ) = \mathop{\mathbb P}( A_{V^\perp} = S \backslash B )} for all {B \in \binom{N}{n}}. By symmetry we can take {B = \{1,\ldots,n\}}. Let {v_1,\ldots,v_n} and {v_{n+1},\ldots,v_N} be an orthonormal basis for {V} and {V^\perp} respectively, and let {M} be the resulting {N \times N} orthogonal matrix; then the task is to show that the top {n \times n} minor {X} of {M} has the same determinant squared as the bottom {N-n \times N-n} minor {Y}. But if one splits {M = \begin{pmatrix} X & Z \\ W & Y\end{pmatrix}}, we see from the orthogonality property that {XX^* = I_n - ZZ^*} and {Y^* Y = I_{N-n} - Z^* Z}, where {I_n} is the {n \times n} identity matrix. But from the singular value decomposition we see that {I_n-ZZ^*} and {I_{N-n}-Z^*Z} have the same determinant, and the claim follows. (One can also establish this lemma using the Hodge star operation.) \Box

From this lemma we see that {S \backslash A} is a determinantal process with kernel {I_N - K}. In particular, we have

\displaystyle  {\Bbb P}( \{i_1,\ldots,i_k\} \cap A = \emptyset ) = \det( I_k - ( K(i_a,i_b) )_{1 \leq a,b \leq k} ). \ \ \ \ \ (2)

The construction of the determinantal process given above is somewhat indirect. A more direct way to build the process exploits the following lemma:

Lemma 7 Let {V} be an {n}-dimensional subspace of {{\mathbb R}^N}, let {A_V} be the corresponding {n}-element determinantal process, and let {1 \leq i_1 < \ldots < i_k \leq N} for some {1 \leq k \leq n}. Then the if one conditions on the event that {\{i_1,\ldots,i_k\} \in A_V} (assuming this event has non-zero probability), the resulting {n-k}-element process {A_V \backslash \{i_1,\ldots,i_k\}} has the same distribution as the {n-k}-element determinantal process {A_W} associated to the {n-k}-dimensional subspace {W} of {V} that is orthogonal to {e_{i_1},\ldots,e_{i_k}}.

Proof: By symmetry it suffices to consider the case {\{i_1,\ldots,i_k\} = \{1,\ldots,k\}}. By a further application of symmetry it suffices to show that

\displaystyle  {\Bbb P}( A_V = \{1,\ldots,n\} ) = {\Bbb P}( \{1,\ldots,k\} \subset A_V ) {\Bbb P}( A_W = \{k+1,\ldots,n\} ).

By the Gram-Schmidt process, we can find an orthonormal basis {v_1,\ldots,v_n} of {V} whose {n \times N} matrix of coefficients vanishes below the diagonal. One then easily verifies (using Lemma 4) that {{\Bbb P}( A_V = \{1,\ldots,n\} )} is the product of the {n} diagonal entries, {{\Bbb P}( \{1,\ldots,k\} \subset A_V )} is the product of the first {k}, and {{\Bbb P}( A_W = \{k+1,\ldots,n\} )} is the product of the last {n-k}, and the claim follows. \Box

From this lemma, it is not difficult to see that one can build {A_V} recursively as {A_V = \{a\} \cup A_{V_a}}, where {a} is a random variable drawn from {S} with a {{\Bbb P}(a=i)=\|P e_i\|^2 / \hbox{dim}(V)} for each {i}, and {V_a} is the subspace of {V} orthogonal to {e_a}. Another consequence of this lemma and the monotonicity property is the negative dependence inequality

\displaystyle  \mathop{\mathbb P}( B_1 \cup B_2 \subset A ) \leq \mathop{\mathbb P}( B_1 \subset A) \mathop{\mathbb P}( B_2 \subset A )

for any disjoint {B_1, B_2 \subset S}; thus the presence of {A} on one set {B_1} reduces the chance of {A} being present on a disjoint set {B_2} (not surprising, since {A} has fixed size).

Thus far, we have only considered point processes with a fixed number {n} of points. As a consequence, the determinantal kernel {K} involved here is of a special form, namely the coefficients of an orthogonal projection matrix to an {n}-dimensional space (or equivalently, a symmetric matrix whose eigenvalues consist of {n} ones and {N-n} zeroes). But one can create more general point processes by taking a mixture of the fixed-number processes, e.g. first picking a projection kernel {K} (or a subspace {V}) by some random process, and then sampling {A} from the point process associated to that kernel or subspace.

For instance, let {\phi_1,\ldots,\phi_N} be an orthonormal basis of {{\mathbb R}^N}, and let {0 \leq \lambda_1,\ldots,\lambda_N \leq 1} be weights. Then we can create a random subspace {V} of {{\mathbb R}^N} by setting {V} equal to the span {V_J} of some random subset {\{ \phi_j: j \in J \}} of the basis {v_1,\ldots,v_N}, where each {j} lies in {J} with an independent probability of {\lambda_j}, and then sampling {A} from {A_V}. Then {A} will be a point process whose cardinality can range from {0} to {N}. Given any set {\{i_1,\ldots,i_k\} \subset S}, we can then compute the probability {\mathop{\mathbb P}( \{i_1,\ldots,i_k\} \subset A )} as

\displaystyle  {\Bbb P}( \{i_1,\ldots,i_k\} \subset A ) = {\Bbb E}_J {\Bbb P}( \{i_1,\ldots,i_k\} \subset A_{V_J} )

where {J} is selected as above. Using (1), we have

\displaystyle  {\Bbb P}( \{i_1,\ldots,i_k\} \subset A_{V_J} ) = \det( K_{V_J}( i_a, i_b ) )_{1 \leq a,b \leq k}.

But {K_{V_J}(i_a,i_b) = \sum_{j \in J} \phi_j(i_a) \phi_j(i_b)}, where {\phi_j(i)} is the {i^{th}} coordinate of {\phi_j}. Thus we can write

\displaystyle  (K_{V_J}(i_a,i_b))_{1 \leq a,b \leq k} = \sum_{j =1}^N {\Bbb I}( j \in J ) R_j

where {{\Bbb I}( j \in J )} is the indicator of the event {j \in J}, and {R_j} is the rank one matrix {( \phi_j(i_a) \phi_j(i_b) )_{1 \leq a,b \leq k}}. Using multilinearity of the determinant, and the fact that any determinant involving two or more rows of the same rank one matrix automatically vanishes, we see that we can express

\displaystyle  \det( (K_{V_J}(i_a,i_b))_{1 \leq a,b \leq k} ) = \sum_{1 \leq j_1,\ldots,j_k \leq N, \hbox{ distinct }} {\Bbb I}( j_1,\ldots,j_k \in J ) \det( R_{j_1,\ldots,j_k} )

wheree {R_{j_1,\ldots,j_k}} is the matrix whose first row is the same as that of {R_{j_1}}, the second row is the same as that of {R_{j_2}}, and so forth. Taking expectations in {J}, the quantity {{\Bbb I}( j_1,\ldots,j_k \in J )} becomes {\lambda_{j_1} \ldots \lambda_{j_k}}. Undoing the multilinearity step, we conclude that

\displaystyle  {\Bbb E}_J \det( K_{V_J}( i_a, i_b ) )_{1 \leq a,b \leq k} = \det( \sum_{j=1}^N \lambda_j R_j )

and thus {A} is a determinantal process with kernel

\displaystyle  K( x, y ) := \sum_{j =1}^N \lambda_j \phi_j(x) \phi_j(y).

To summarise, we have created a determinantal process {A} whose kernel {K} is now an arbitrary symmetric matrix with eigenvalues {\lambda_1,\ldots,\lambda_n \in [0,1]}, and it is a mixture of constant-size processes {A_{V_J}}. In particular, the cardinality {|A|} of this process has the same distribution as the cardinality {|J|} of the random subset of {\{1,\ldots,N\}}, or in other words {|A| \equiv I_{\lambda_1} + \ldots + I_{\lambda_k}}, where {I_{\lambda_1},\ldots,I_{\lambda_k}} are independent Bernoulli variables with expectation {\lambda_1,\ldots,\lambda_k} respectively.

Observe that if one takes a determinantal process {A \subset S} with kernel {K}, and restricts it to a subset {S'} of {S}, then the resulting process {A \cap S' \subset S'} is a determinantal process whose kernel {K'} is simply the restriction of {K} to the {S' \times S'} block of {S \times S}. Applying the previous observation, we conclude that the random variable {|A \cap S'|} has the same distribution as the sum of {|S'|} independent Bernoulli variables, whose expectations are the eigenvalues of the restriction of {K} to {S'}. (Compare this to the Poisson point process {A} with some intensity measure {\lambda}, where the distribution of {|A \cap \Omega|} is a Poisson process with intensity {\lambda(\Omega)}.) Note that most point processes do not obey this property (e.g. the uniform distribution on {\binom{S}{n}} does not unless {n=0,1} or {n=N,N-1}), and so most point processes are not determinantal.

It is known that increasing a positive semi-definite matrix by another positive semi-definite matrix does not decrease the determinant (indeed, it does not decrease any eigenvalue, by the minimax characterisation of those eigenvalues). As a consequence, if the kernel {K'} of a determinantal process {A'} is larger than the kernel {K} of another determinantal process {A} in the sense that {K-K'} is positive semi-definite, then {A'} is “larger” than {A} in the sense that {{\Bbb P}( B \subset A') \geq {\Bbb P}( B \subset A)} for all {B \subset S}. A particularly nice special case is when {K = c K'} for some {0 \leq c \leq 1}, then {{\Bbb P}( B \subset A) = c^{|B|} {\Bbb P}( B \subset A')} for all {B}, and one can interpret {A} as the process obtained from {A'} by deleting each element of {A'} independently at random with probability {1-c} (i.e. keeping that element independently at random with probability {c}).

As a consequence of this, one can obtain a converse to our previous construction of determinantal processes, and conclude that a determinantal process can be associated to a symmetric kernel {K} only if the eigenvalues of {K} lie between zero and one. The fact that {K} is positive semi-definite follows from the fact that all symmetric minors of {K} have non-negative determinant (thanks to (1)). Now suppose for contradiction that {K} has an eigenvalue larger than {1}, then one can find {0 \leq c < 1} such that the largest eigenvalue of {cK} is exactly {1}. By our previous discussion, the process {A_{cK}} associated to {cK} is then formed from the process {A_K} by deleting each element of {A} with non-zero probability; in particular, {A_K} is empty with non-zero probability. On the other hand, we know that {|A_K|} has the distribution of the sum of independent Bernoulli variables, at least one of which is {1} with probability one, a contradiction. (This proof is due to Hough et al., though the result is originally due to Soshnikov. An alternate proof is to extend the identity (2) to all determinantal processes and conclude that {I-K} is necessarily positive definite.)

— 2. Continuous determinantal processes —

One can extend the theory of discrete determinantal processes to the continuous setting. For simplicity we restrict attention to (simple) point processes {A \subset {\mathbb R}} on the real line. A process {A} is said to have correlation functions {\rho_k: {\mathbb R}^k \rightarrow {\mathbb R}} for {k \geq 1} if the {\rho_k} are symmetric, non-negative, and locally integrable, and one has the formula

\displaystyle  {\Bbb E} \sum_{x_1,\ldots,x_k \in A, \hbox{ distinct}} f( x_1,\ldots,x_k ) = \int_{{\mathbb R}^k} f(x_1,\ldots,x_k) \rho_k(x_1,\ldots,x_k)\ dx_1 \ldots dx_k

for any bounded measurable symmetric {f} with compact support, where the left-hand side is summed over all {k}-tuples of distinct points in {A} (this sum is of course empty if {|A| \leq k}). Intuitively, the probability that {A} contains an element in the infinitesimal interval {[x_i,x_i+dx_i]} for all {1 \leq i \leq k} and distinct {x_1,\ldots,x_k} is equal to {\rho_k(x_1,\ldots,x_k) dx_1 \ldots dx_k}. The {\rho_k} are not quite probability distributions; instead, the integral {\int_{{\mathbb R}^k} \rho_k} is equal to {k! {\Bbb E} \binom{|A|}{k}}. Thus, for instance, if {A} is a constant-size process of cardinality {n}, then {\rho_k} has integral {\frac{n!}{(n-k)!}} on {{\mathbb R}^n} for {1 \leq k \leq n} and vanishes for {k>n}.

If the correlation functions exist, it is easy to see that they are unique (up to almost everywhere equivalence), and can be used to compute various statistics of the process. For instance, an application of the inclusion-exclusion principle shows that for any bounded measurable set {\Omega}, the probability that {A \cap \Omega = \emptyset} is (formally) equal to

\displaystyle  \sum_{k=0}^\infty \frac{(-1)^k}{k!} \int_{({\mathbb R} \backslash \Omega)^k} \rho_k(x_1,\ldots,x_k)\ dx_1 \ldots dx_k.

A process is determinantal with some symmetric measurable kernel {K: {\mathbb R} \times {\mathbb R} \rightarrow {\mathbb R}} if it has correlation functions {\rho_k} given by the formula

\displaystyle  \rho_k(x_1,\ldots,x_k) = \det (K(x_i,x_j))_{1 \leq i,j \leq k}. \ \ \ \ \ (3)

Informally, the probability that {A} intersects the infinitesimal intervals {[x_i,x_i+dx_i]} for distinct {x_1,\ldots,x_k} is {\det (K(x_i,x_j) dx_i^{1/2} dx_j^{1/2})_{1 \leq i,j \leq k}}. (Thus, {K} is most naturally interpreted as a half-density, or as an integral operator from {L^2({\mathbb R})} to {L^2({\mathbb R})}.)

There are analogues of the discrete theory in this continuous setting. For instance, one can show that a symmetric measurable kernel {K} generates a determinantal process if and only if the associated integral operator {{\mathcal K}} has spectrum lies in the interval {[0,1]}. The analogue of (2) is the formula

\displaystyle  {\Bbb P}( A \cap \Omega = \emptyset ) = \det( I - {\mathcal K}|_\Omega );

more generally, the distribution of {|A \cap \Omega|} is the sum of independent Bernoulli variables, whose expectations are the eigenvalues of {{\mathcal K}|_\Omega}. Finally, if {{\mathcal K}} is an orthogonal projection onto an {n}-dimensional space, then the process has a constant size of {n}. Conversely, if {A} is a process of constant size {n}, whose {n^{th}} correlation function {\rho_n(x_1,\ldots,x_n)} is given by (3), where {{\mathcal K}} is an orthogonal projection onto an {n}-dimensional space, then (3) holds for all other values of {k} as well, and so {A} is a determinantal process with kernel {K}. (This is roughly the analogue of Lemma 4.)

These facts can be established either by approximating a continuous process as the limit of discrete ones, or by obtaining alternate proofs of several of the facts in the previous section which do not rely as heavily on the discrete hypotheses. See Hough et al. for details.

A Poisson process can be viewed as the limiting case of a determinantal process in which {{\mathcal K}} degenerates to a (normalisation of) a multiplication operator {f \mapsto \lambda f}, where {\lambda} is the intensity function.

— 3. The spectrum of GUE —

Now we turn to a specific example of a continuous point process, namely the spectrum {A = \{\lambda_1,\ldots,\lambda_n\} \subset {\mathbb R}} of the Gaussian unitary ensemble {M_n = (\zeta_{ij})_{1 \leq i,j \leq n}}, where the {\zeta_{ij}} are independent for {1 \leq i \leq j \leq n} with mean zero and variance {1}, with {\zeta_{ij}} being the standard complex gaussian for {i<j} and the standard real gaussian {N(0,1)} for {i=j}. The probability distribution of {M_n} can be expressed as

\displaystyle  c_n \exp( - \frac{1}{2} \hbox{trace}( M_n^2 ) )\ dM_n

where {dM_n} is Lebesgue measure on the space of Hermitian {n \times n} matrices, and {c_n > 0} is some explicit normalising constant.

The {n}-point correlation function of {A} can be computed explicitly:

Lemma 8 (Ginibre formula) The {n}-point correlation function {\rho_n(x_1,\ldots,x_n)} of the GUE spectrum {A} is given by

\displaystyle  \rho_n(x_1,\ldots,x_n) = c'_n (\prod_{1 \leq i <j \leq n} |x_i-x_j|^2) \exp( - \sum_{i=1}^n x_i^2 / 2 ) \ \ \ \ \ (4)

where the normalising constant {c'_n} is chosen so that {\rho_n} has integral {1}.

The constant {c'_n > 0} is essentially the reciprocal of the partition function for this ensemble, and can be computed explicitly, but we will not do so here.

Proof: Let {D} be a diagonal random matrix {D = \hbox{diag}(x_1,\ldots,x_n)} whose entries are drawn using the distribution {\rho_n(x_1,\ldots,x_n)} defined by (4), and let {U \in U(n)} be a unitary matrix drawn uniformly at random (with respect to Haar measure on {U(n)}) and independently of {D}. It will suffice to show that the GUE {M_n} has the same probability distribution as {U^* DU}. Since probability distributions have total mass one, it suffices to show that their distributions differ up to multiplicative constants.

The distributions of {M_n} and {U^* DU} are easily seen to be continuous and invariant under unitary rotations. Thus, it will suffice to show that their probability density at a given diagonal matrix {D_0 = \hbox{diag}(x^0_1, \ldots, x^0_n)} are the same up to multiplicative constants. We may assume that the {x^0_i} are distinct, since this occurs for almost every choice of {D_0}.

On the one hand, the probability density of {M_n} at {D_0} is proportional to {\exp( - \sum_{i=1}^n (x_i^0)^2 / 2 )}. On the other hand, a short computation shows that if {U^* D U} is within a distance {O(\epsilon)} of {D_0} for some infinitesimal {\epsilon>0}, then (up to permutations) {D} must be a distance {O(\epsilon)} from {D_0}, and the {ij} entry of {U} must be a complex number of size {O( \epsilon / |x_i^0 - x_j^0| )} for {1 \leq i < j \leq n}, while the diagonal entries of {U} can be arbitrary phases. Pursuing this computation more rigorously (e.g. using the Harish-Chandra formula) and sending {\epsilon \rightarrow 0}, one can show that the probability density of {U^* D U} at {D_0} is a constant multiple of

\displaystyle  \rho_n(x_1,\ldots,x_n) \prod_{1 \leq i < j \leq n} \frac{1}{|x_i^0 - x_j^0|^2}

(the square here arising because of the complex nature of the {ij} coefficient of {U}) and the claim follows. \Box

One can also represent the {k}-point correlation functions as a determinant:

Lemma 9 (Gaudin-Mehta formula) The {k}-point correlation function {\rho_k(x_1,\ldots,x_n)} of the GUE spectrum {A} is given by

\displaystyle  \rho_k(x_1,\ldots,x_k) = \det( K_n( x_i, x_j ) )_{1 \leq i < j \leq k} \ \ \ \ \ (5)

where {K_n(x,y)} is the kernel of the orthogonal projection {{\mathcal K}} in {L^2({\mathbb R})} to the space spanned by the polynomials {x^i e^{-x^2/4}} for {i=0,\ldots,n-1}. In other words, {A} is the {n}-point determinantal process with kernel {K_n}.

Proof: By the material in the preceding section, it suffices to establish this for {k=n}. As {K} is the kernel of an orthogonal projection to an {n}-dimensional space, it generates an {n}-point determinantal process and so {\det( K_n( x_i, x_j ) )_{1 \leq i < j \leq n}} has integral {\binom{n}{n}=1}. Thus it will suffice to show that {\rho_n} and {\det( K_n( x_i, x_j ) )_{1 \leq i < j \leq n}} agree up to multiplicative constants.

By Gram-Schmidt, one can find an orthonormal basis {\phi_i(x) e^{-x^2/4}}, {i=0,\ldots,n-1} for the range of {{\mathcal K}}, with each {\phi_i} a polynomial of degree {i} (these are essentially the Hermite polynomials). Then we can write

\displaystyle  K_n(x_i,x_j) = \sum_{k=0}^{n-1} \phi_k(x_i) \phi_k(x_j) e^{-(x_i^2+x_j^2)/4}.

Cofactor expansion then shows that {\det( K_n( x_i, x_j ) )_{1 \leq i < j \leq n}} is equal to {\exp( - \sum_{i=1}^n x_i^2 / 2 )} times a polynomial {P(x_1,\ldots,x_n)} in {x_1,\ldots,x_n} of degree at most {2\sum_{k=0}^{n-1} k = n(n-1)}. On the other hand, this determinant is always non-negative, and vanishes whenever {x_i=x_j} for any {1 \leq i < j \leq n}, and so must contain {(x_i-x_j)^2} as a factor for all {1 \leq i < j \leq n}. As the total degree of all these (relatively prime) factors is {n(n-1)}, the claim follows. \Box

This formula can be used to obtain asymptotics for the (renormalised) GUE eigenvalue spacings in the limit {n \rightarrow \infty}, by using asymptotics for (renormalised) Hermite polynomials; this was first established by Dyson.