This week at UCLA, Pierre-Louis Lions gave one of this year’s Distinguished Lecture Series, on the topic of mean field games. These are a relatively novel class of systems of partial differential equations, that are used to understand the behaviour of multiple agents each individually trying to optimise their position in space and time, but with their preferences being partly determined by the choices of all the other agents, in the asymptotic limit when the number of agents goes to infinity. A good example here is that of traffic congestion: as a first approximation, each agent wishes to get from A to B in the shortest path possible, but the speed at which one can travel depends on the density of other agents in the area. A more light-hearted example is that of a Mexican wave (or audience wave), which can be modeled by a system of this type, in which each agent chooses to stand, sit, or be in an intermediate position based on his or her comfort level, and also on the position of nearby agents.

Under some assumptions, mean field games can be expressed as a coupled system of two equations, a Fokker-Planck type equation evolving forward in time that governs the evolution of the density function ${m}$ of the agents, and a Hamilton-Jacobi (or Hamilton-Jacobi-Bellman) type equation evolving backward in time that governs the computation of the optimal path for each agent. The combination of both forward propagation and backward propagation in time creates some unusual “elliptic” phenomena in the time variable that is not seen in more conventional evolution equations. For instance, for Mexican waves, this model predicts that such waves only form for stadiums exceeding a certain minimum size (and this phenomenon has apparently been confirmed experimentally!).

Due to lack of time and preparation, I was not able to transcribe Lions’ lectures in full detail; but I thought I would describe here a heuristic derivation of the mean field game equations, and mention some of the results that Lions and his co-authors have been working on. (Video of a related series of lectures (in French) by Lions on this topic at the Collége de France is available here.)

To avoid (rather important) technical issues, I will work at a heuristic level only, ignoring issues of smoothness, convergence, existence and uniqueness, etc.

— 1. Hamilton-Jacobi-Bellman equations —

Before considering mean field games, let us consider a more classical problem in calculus of variations, namely that of a single agent trying to optimise his or her path in spacetime with respect to a fixed cost function to minimise against. (One could also reverse the sign here, and maximise a utility function rather than minimise a cost function; mathematically, there is no distinction between the two. (A half-empty glass is mathematically equivalent to a half-full one.))

Specifically, suppose that an agent is at some location ${x(0)}$ at time ${t=0}$ in some ambient domain (which, for simplicity, we shall take to be a Euclidean space ${{\mathbb R}^d}$), and would like to end up at some better location ${x(T)}$ at a later time ${t=T>0}$. To model this, we imagine that each location ${x}$ in the domain ${{\mathbb R}^d}$ has some cost ${u(T,x) \in {\mathbb R}}$ at this final time ${T}$, which is small when ${x}$ is a desirable location and large otherwise, so that the agent would like to minimise ${u(T,x(T))}$. (In the traffic problem, one may wish to be at a given location ${B}$ by time ${T}$, or failing that, on some location close to ${B}$, or perhaps at some secondary backup location in case ${B}$ is too inaccessible due to traffic; so the cost function may have a global minimum at ${B}$ but perhaps also have local minima elsewhere.)

If transportation was not a problem, this is an easy problem to solve: one simply finds the value of ${x(T)}$ that minimises ${u(x(T))}$, and the agent takes an arbitrary path (e.g. a constant velocity straight line path) from ${x(0)}$ at time ${t=0}$ to ${x(T)}$ at time ${t=T}$.

But now suppose that there is a transportation cost in addition to the cost of the final location – for instance, moving at too fast a velocity may incur an energy cost. To model this, we introduce a velocity cost function ${C: {\mathbb R}^d \rightarrow {\mathbb R}}$, where ${C(v) dt}$ measures the marginal cost of moving at a given velocity ${v}$ for time ${dt}$, and then define the total cost of a trajectory ${x: [0,T] \rightarrow {\mathbb R}^d}$ by the formula

$\displaystyle u(T, x(T)) + \int_0^T C(x'(t))\ dt. \ \ \ \ \ (1)$

The goal now is to select a trajectory that minimises this total cost. This is a simplified model, in which cost depends only on velocity and not on position and time; one could certainly consider more complicated cost functions ${C(v,x,t)}$ which depend on these parameters, in which case the ${C(x'(t))}$ term above would have to be replaced with ${C(x'(t),x(t),t)}$, but let us work with the above simplified model for the current discussion.

A model example of a cost function ${C}$ is a quadratic cost function ${C(v) = \frac{1}{2} |v|^2}$ (other normalising factors than ${\frac{1}{2}}$ can of course be used here). Thus one is penalised for moving too fast, or for “wasting” velocity by zig-zagging back and forth. In such a situation, the agent should move in a straight line at constant velocity to a location ${x(T)}$ with relatively low final cost ${u(x(T))}$ but which is also reasonably close to the original position ${x(0)}$; the global minimum for the final cost ${u}$ may no longer be the best place to shoot for, as it may be so far away that the transportation cost exceeds whatever cost savings one gets from the final cost.

More generally, it is natural to choose ${C}$ to be convex – thus, for instance, given two velocities ${v}$ and ${w}$, ${C(\frac{v+w}{2})}$ should be less than or equal to the average of ${C(v)}$ and ${C(w)}$. The reason for this is that one can effectively “simulate” a velocity of ${\frac{v+w}{2}}$ by zig-zagging between velocities ${v}$ and velocities ${w}$, and so (assuming infinite maneuverability) one can always travel at an effective velocity of ${\frac{v+w}{2}}$ at a mean cost of ${\frac{C(v)+C(w)}{2}}$.

To avoid some technical issues, we will assume ${C}$ to be strictly convex, and to make the formulae slightly nicer we will also assume that ${C}$ is even, thus ${C(-v)=C(v)}$.

One way to compute the optimal trajectory here is to solve the Euler-Lagrange equation associated to (1), with the boundary condition that the initial position ${x(0)}$ is fixed. This is an ODE for ${x(t)}$ which can be solved by a variety of methods. But there is another approach, based on solving a PDE rather than an ODE, and which conveys more information about the solution (such as the dependence on the initial position). The idea is to generalise the initial time from ${0}$ to any other time ${t_0}$ between ${0}$ and ${T}$. More precisely, given any ${0 \leq t_0 \leq T}$ and ${x_0 \in {\mathbb R}^d}$, define the optimal cost ${u(t_0,x_0)}$ at the point ${(t_0,x_0)}$ in spacetime to be the infimum of the cost

$\displaystyle u(T, x(T)) + \int_{t_0}^T C(x'(t))\ dt$

over all (smooth) paths ${x: [t_0,T] \rightarrow {\mathbb R}^d}$ starting at ${x(t_0) = x_0}$ and with an arbitrary endpoint ${x(T)}$. Informally, this is the cost the agent would place on being at position ${x_0}$ at time ${t_0}$.

By definition, when ${t_0 = T}$, the optimal cost at ${x_0}$ coincides with the final cost at ${x_0}$, which justifies using the same symbol ${u}$ to denote both. The final cost ${u(T,\cdot)}$ can thus be viewed as a boundary condition for the optimal cost.

But what happens for times ${t_0}$ less than ${T}$? It turns out that (under some regularity hypotheses, which I will not discuss here) the optimal cost function ${u}$ obeys a partial differential equation, known as the Hamilton-Jacobi-Bellman equation, which we shall heuristically derive (using infinitesimals as an informal tool) as follows.

Imagine that the agent finds herself or himself at position ${x_0}$ at some time ${t_0 < T}$, and is deciding where to go next. Presumably there is some optimal velocity ${v}$ in which the agent should move in (a priori, this velocity need not be unique). So, if ${dt}$ is an infinitesimal time, the agent should move at this velocity for time ${dt}$, ending up at a new position ${x_0+vdt}$ at time ${t_0+dt}$, and incurring a travel cost of ${C(v) dt}$. At this point, the optimal cost for the remainder of the agent’s journey is given by ${u(t_0+dt,x_0+vdt)}$, by definition of ${u}$. This leads to the heuristic formula

$\displaystyle u(t_0,x_0) = u(t_0+dt,x_0+vdt) + C(v) dt \ \ \ \ \ (2)$

which on Taylor expansion (and omitting higher order terms) gives

$\displaystyle u(t_0,x_0) = u(t_0,x_0) + dt [ \partial_t u(t_0,x_0) + v \cdot \nabla_x u(t_0,x_0) + C(v) ].$

On the other hand, ${v}$ is being chosen to minimise the final cost. Thus we see that ${v}$ should be chosen to minimise the expression

$\displaystyle v \cdot \nabla_x u(t_0,x_0) + C(v).$

Note from the strict convexity that this minimum ${v}$ will be unique, and will be some function of ${\nabla_x u(t_0,x_0)}$. If we introduce the Legendre transform ${H: {\mathbb R}^d \rightarrow {\mathbb R}}$ of ${C: {\mathbb R}^d \rightarrow {\mathbb R}}$ by the formula

$\displaystyle H(p) := \sup_{v \in {\mathbb R}^d} v \cdot p - C(v)$

then the minimal value of ${v \cdot \nabla_x u(t_0,x_0) + C(v)}$ is just ${-H(\nabla_x u(t_0,x_0))}$ (here we use the hypothesis that ${C}$ is even). We conclude that

$\displaystyle u(t_0,x_0) = u(t_0,x_0) + dt [ \partial_t u(t_0,x_0) - H(\nabla_x u(t_0,x_0)]$

$\displaystyle -\partial_t u + H(\nabla_x u) = 0.$

Note that this equation is being solved backwards in time, as the optimal cost is prescribed at the final time ${t=T}$, but we are interested in its value at earlier times, and in particular when ${t=0}$.

Once one solves this equation, one can work out the optimal velocity ${v = v(x_0,t_0)}$ to travel at each location ${(x_0,t_0)}$. Indeed, from the above discussion we know that ${v}$ is to minimise the expression

$\displaystyle v \cdot \nabla_x u(t_0,x_0) + C(v),$

and thus ${\tilde v := -v}$ maximises the expression

$\displaystyle \tilde v \cdot p - C(\tilde v)$

where ${p := \nabla_x u(t_0,x_0)}$. By definition, the value of this expression is equal to ${H(p)}$:

$\displaystyle H(p) = \tilde v \cdot p - C(\tilde v).$

We can view ${\tilde v}$ as a function of ${p}$. As ${\tilde v}$ maximises this expression for fixed ${p}$, we see that

$\displaystyle \frac{\partial}{\partial \tilde v} ( \tilde v \cdot p - C(\tilde v) ) = 0.$

Applying the chain rule, we conclude that

$\displaystyle H'(p) = \tilde v$

and so the correct velocity to move in is given by

$\displaystyle v = - H'( \nabla_x u ). \ \ \ \ \ (3)$

Now let us make things a little more interesting by adding some random noise. Suppose that the agent’s steering mechanism is subject to a little bit of fluctuation, so that if the agent wishes to get from ${x_0}$ to ${x_0+vdt}$ in time ${dt}$, the agent instead ends up at ${x_0 + vdt + \sigma dB_t}$, where ${dB_t}$ is the infinitesimal of a standard Brownian motion in ${{\mathbb R}^d}$, and ${\sigma > 0}$ is a parameter measuring the noise level. With this stochastic model, the total cost is now a stochastic quantity rather than a deterministic one, and so the rational thing for the agent to do now is to minimise the expectation of the cost. Here we begin to see an advantage of the Hamilton-Jacobi approach over the Euler-Lagrange approach; while the latter approach becomes quite complicated technically in the presence of stochasticity, the former approach carries over without much difficulty. Indeed, we can define the optimal (expected) cost function ${u(t_0,x_0)}$ much as before, as the minimal expected cost over all strategies of the agent; this is a deterministic function due to the taking of expectations. The equation (2) is now modified to

$\displaystyle u(t_0,x_0) = {\bf E} u(t_0+dt,x_0+vdt + \sigma d B_t) + C(v) dt.$

We Taylor expand this, using the Ito’s formula heuristic ${dB_t = O(dt^{1/2})}$ to obtain

$\displaystyle u(t_0,x_0) = {\bf E} u(t_0,x_0) + \partial_t u_0(t_0,x_0) dt + v \cdot \nabla_x u_0(t_0,x_0) dt$

$\displaystyle + dB_t \cdot \nabla_x u_0(t_0,x_0) + \frac{\sigma^2}{2} \nabla^2 u(t_0,x_0)(d B_t,d B_t) + C(v) dt.$

Now, Brownian motion ${dB_t}$ over time ${dt}$ has zero expectation, and each of the ${d}$ components has a variance of ${dt^2}$ (and the covariances are zero). As such, we can compute the expectation here and obtain

$\displaystyle u(t_0,x_0) = u(t_0,x_0) + \partial_t u_0(t_0,x_0) dt + v \cdot \nabla_x u_0(t_0,x_0) dt$

$\displaystyle + \frac{\sigma^2}{2} \Delta u(t_0,x_0) dt + C(v) dt.$

So the only effect of the noise is to add an additional term ${\frac{\sigma^2}{2} \Delta u(t_0,x_0) dt}$ to the right-hand side. This term does not depend on ${v}$ and so does not affect the remainder of the analysis; at the end of the day, one obtains the viscous Hamilton-Jacobi-Bellman equation

$\displaystyle -\partial_t u - \nu \Delta u + H(\nabla_x u) = 0$

for the optimal expected cost, where ${\nu := \frac{1}{2} \sigma^2}$ is the viscosity, and the optimal velocity is still given by the formula (3). This can be viewed as a nonlinear backwards heat equation, which makes sense since we are solving for this cost backwards in time. The diffusive effect of the heat equation then reflects the uncertainty of future cost caused by the random noise. (A similar diffusive effect appears in the Black-Scholes equation for pricing options, for much the same reason.)

— 2. Fokker-Planck equations —

Now suppose that instead of having just one agent, one has a huge number ${N}$ of agents distributed throughout space. To simplify things, let us assume that the agents all have identical motivations (in particular, they are all trying to minimise the same cost function), which implies in particular that all the agents at a given point ${(t_0,x_0)}$ in spacetime will all move in a given velocity ${v(t_0,x_0)}$ (let us ignore the random noise for this initial discussion).

Rather than deal with each of the agents separately, we will pass to a continuum limit ${N \rightarrow \infty}$ and consider just the (normalised) density function ${m(t,x)}$ of the agents, which is a non-negative function with total mass ${\int_{{\mathbb R}^d} m(t,x)\ dx = 1}$ for each time ${t}$. Informally, for an infinitesimal box in space ${[x,x+dx]}$, the number of agents in that box should be approximately ${N m(t,x) |dx|}$.

We now suppose that the velocity field ${v}$ is given to us, as well as the initial distribution ${m(0,x)}$ of the agents, and ask how the distribution will evolve as time goes forward. There are several ways to find the answer, but we will take a distributional viewpoint and test the density ${m(t,x)}$ against various test functions ${F(x)}$ – smooth, compactly supported functions of space (independent of time). The integral ${\int_{{\mathbb R}^d} m(t,x) F(x) \ dx}$ can be viewed as the continuum limit of the sum ${\frac{1}{N} \sum_{i=1}^N F(x_i(t))}$, where ${x_i(t)}$ is the location of the ${i^{th}}$ agent at time ${t}$.

Let us see how this integral should vary in time. At time ${t}$, the ${i^{th}}$ agent should move at velocity ${v(t,x_i(t))}$. Differentiating both sides of

$\displaystyle \int_{{\mathbb R}^d} m(t,x) F(x)\ dx \approx \frac{1}{N} \sum_{i=1}^N F(x_i(t))$

using the chain rule, we thus arrive at the heuristic formula

$\displaystyle \int_{{\mathbb R}^d} \partial_t m(t,x) F(x)\ dx \approx \frac{1}{N} \sum_{i=1}^N v(t,x_i(t)) \cdot \nabla F(x_i(t)).$

The right-hand side, in the continuum limit ${N \rightarrow \infty}$, should become

$\displaystyle \int_{{\mathbb R}^d} m(t,x) (v(t,x) \cdot \nabla F(x))\ dx$

which, after an integration by parts, becomes

$\displaystyle - \int_{{\mathbb R}^d} \nabla\cdot (mv)(t,x) F(x)\ dx.$

To summarise, for every test function ${F}$ we have

$\displaystyle \int_{{\mathbb R}^d} (\partial_t m(t,x) + \nabla\cdot (mv)(t,x)) F(x)\ dx = 0$

$\displaystyle \partial_t m(t,x) + \nabla\cdot (mv)(t,x) = 0.$

Now let us reintroduce the same random noise model that we considered in the previous section. Thus, of all the agents that are infinitesimally close to ${x}$ at time ${t}$, they will all try to move to ${x+vdt}$ at time ${t+dt}$, but instead each one ends up at a slightly different location ${x+vdt+\sigma dB_t}$, where the Brownian path ${dB_t}$ is different for each agent. As such, we are led to the heuristic equation

$\displaystyle \int_{{\mathbb R}^d} m(t+dt,x) F(x)\ dx \approx \frac{1}{N} \sum_{i=1}^N F(x_i(t) + v(t,x_i(t)) dt + \sigma dB_t).$

Taylor expanding the right-hand side as before, and then passing to the continuum limit, we eventually see that the right-hand side takes the form

$\displaystyle \int_{{\mathbb R}^d} m(t,x) (F(x) + v dt \cdot \nabla F(x) + \frac{\sigma^2}{2} \Delta F(x))\ dx$

and then repeating the above computations leads us to the Fokker-Planck equation

$\displaystyle \partial_t m(t,x) - \nu \Delta m(t,x) + \nabla \cdot (mv)(t,x) = 0. \ \ \ \ \ (4)$

— 3. Mean field games —

In the derivation of the Hamilton-Jacobi-Bellman equations above, each agent had a fixed cost function to minimise, that did not depend on the location of the other agents. A mean field game generalises this model, by allowing the cost function of each agent to also depend on the density function ${m}$ of all the other agents. For instance, when modeling traffic congestion, the cost function ${C(v)}$ may depend not only on the velocity ${v(t,x)}$ that one currently wishes to move at, but also on the density ${m(t,x)}$ of traffic at that point in space and time.

There are many ways to achieve such a generalisation. The simplest would be an additive model, in which the cost function (1) is replaced with

$\displaystyle u(T, x(T)) + \int_0^T C(x'(t))\ dt + \int_0^T F(m(t,x(t)))\ dt$

where ${F: {\mathbb R}^+ \rightarrow {\mathbb R}}$ represents the marginal cost to an agent of having a given density at the current location. If ${F}$ is increasing, this intuitively means that the agent prefers to be away from the other agents (a reasonable hypothesis for traffic congestion), which should lead to a repulsive effect; conversely, a decreasing ${F}$ should lead to an attractive effect.

The Hamilton-Jacobi-Bellman equations can be heuristically derived for this cost functional by a similar analysis to before, leading (in the presence of viscosity) to the equation

$\displaystyle -\partial_t u - \nu \Delta u + H(\nabla_x u) = F(m). \ \ \ \ \ (5)$

Meanwhile, the velocity field ${v}$ is still given by the formula (3), so the Fokker-Planck equation (4) becomes (after reversing sign)

$\displaystyle - \partial_t m + \nu \Delta m + \nabla_x \cdot(m H'(\nabla_x u))) = 0. \ \ \ \ \ (6)$

The system of (5), (6), with prescribed final data ${u(T,x)}$ for ${u}$ at time ${T}$, and initial data ${m(0,x)}$ for ${m}$ at time ${0}$, is then an example of a mean field game. The backward evolution equation (5) represents the agents’ decisions based on where they want to be in the future; and the forward evolution equation (6) represents where they actually end up, based on their initial distribution.

Solving this coupled system of equations, one evolving backwards in time, and one evolving forwards in time, is highly non-trivial, and in some cases existence or uniqueness or both break down (which suggests that the mean field approximation is not valid in this setting). However, there are various situations in which things behave well: having a small ${T}$ (so that the agents only plan ahead for a short period of time), having positive viscosity, and having an increasing ${F}$ (i.e. an aversion to crowds) all help significantly, and one typically has a good existence and uniqueness theory in such cases, based to a large extent on energy methods.

One interesting feature is that when ${T}$ becomes large enough (and in the attractive case when ${F}$ is decreasing), uniqueness can start to break down, due to the existence of standing waves that are sustained solely by nonlinear interactions between the forward evolution equation and the backward one.

The Mexican wave phenomenon can be modeled by a more complicated version of a mean field game. Here, the agents are the audience members of a stadium, and their position is modeled both by a horizontal position ${x}$ (which would be in a bounded two-dimensional domain), as well as an altitude ${z}$ (they could be sitting, standing, or be in an intermediate position). The agents cannot actually move in the ${x}$ direction, but only in the ${z}$ direction. The cost function consists of three terms: the familiar term ${C(v)}$ that penalises excessive movement, a “comfort” function ${u(z)}$ that penalises intermediate positions of ${z}$ between the sitting and standing positions (since it is not comfortable to stay in a half-standing position for any length of time), and the mean field component which penalises those agents at a position ${x_0}$ whose deviation height ${z_0}$ is too far from the height of nearby agents, in the sense that an integral such as ${\int_{|x-x_0| < \epsilon} |z-z_0| m(t,x,z)\ dx dz}$, is large. (An audience member who is sitting when everyone is standing, or vice versa, would presumably feel uncomfortable with this status; this is analogous to the attractive case of the potential ${F}$ in the previous example.) It turns out that under reasonable simplifying assumptions, one can create non-trivial travelling wave solutions to this mean field game – but only if the stadium is large enough. The causal mechanism for such waves is somewhat strange, though, due to the presence of the backward propagating equation – in some sense, the wave continues to propagate because the audience members expect it to continue to propagate, and act accordingly. (One wonders if these sorts of equations could provide a model for things like asset price bubbles, which seem to be governed by a similar mechanism.)