This week at UCLA, Pierre-Louis Lions gave one of this year’s Distinguished Lecture Series, on the topic of mean field games. These are a relatively novel class of systems of partial differential equations, that are used to understand the behaviour of multiple agents each individually trying to optimise their position in space and time, but with their preferences being partly determined by the choices of all the other agents, in the asymptotic limit when the number of agents goes to infinity. A good example here is that of traffic congestion: as a first approximation, each agent wishes to get from A to B in the shortest path possible, but the speed at which one can travel depends on the density of other agents in the area. A more light-hearted example is that of a Mexican wave (or audience wave), which can be modeled by a system of this type, in which each agent chooses to stand, sit, or be in an intermediate position based on his or her comfort level, and also on the position of nearby agents.
Under some assumptions, mean field games can be expressed as a coupled system of two equations, a Fokker-Planck type equation evolving forward in time that governs the evolution of the density function of the agents, and a Hamilton-Jacobi (or Hamilton-Jacobi-Bellman) type equation evolving backward in time that governs the computation of the optimal path for each agent. The combination of both forward propagation and backward propagation in time creates some unusual “elliptic” phenomena in the time variable that is not seen in more conventional evolution equations. For instance, for Mexican waves, this model predicts that such waves only form for stadiums exceeding a certain minimum size (and this phenomenon has apparently been confirmed experimentally!).
Due to lack of time and preparation, I was not able to transcribe Lions’ lectures in full detail; but I thought I would describe here a heuristic derivation of the mean field game equations, and mention some of the results that Lions and his co-authors have been working on. (Video of a related series of lectures (in French) by Lions on this topic at the Collége de France is available here.)
To avoid (rather important) technical issues, I will work at a heuristic level only, ignoring issues of smoothness, convergence, existence and uniqueness, etc.
— 1. Hamilton-Jacobi-Bellman equations —
Before considering mean field games, let us consider a more classical problem in calculus of variations, namely that of a single agent trying to optimise his or her path in spacetime with respect to a fixed cost function to minimise against. (One could also reverse the sign here, and maximise a utility function rather than minimise a cost function; mathematically, there is no distinction between the two. (A half-empty glass is mathematically equivalent to a half-full one.))
Specifically, suppose that an agent is at some location at time in some ambient domain (which, for simplicity, we shall take to be a Euclidean space ), and would like to end up at some better location at a later time . To model this, we imagine that each location in the domain has some cost at this final time , which is small when is a desirable location and large otherwise, so that the agent would like to minimise . (In the traffic problem, one may wish to be at a given location by time , or failing that, on some location close to , or perhaps at some secondary backup location in case is too inaccessible due to traffic; so the cost function may have a global minimum at but perhaps also have local minima elsewhere.)
If transportation was not a problem, this is an easy problem to solve: one simply finds the value of that minimises , and the agent takes an arbitrary path (e.g. a constant velocity straight line path) from at time to at time .
But now suppose that there is a transportation cost in addition to the cost of the final location – for instance, moving at too fast a velocity may incur an energy cost. To model this, we introduce a velocity cost function , where measures the marginal cost of moving at a given velocity for time , and then define the total cost of a trajectory by the formula
The goal now is to select a trajectory that minimises this total cost. This is a simplified model, in which cost depends only on velocity and not on position and time; one could certainly consider more complicated cost functions which depend on these parameters, in which case the term above would have to be replaced with , but let us work with the above simplified model for the current discussion.
A model example of a cost function is a quadratic cost function (other normalising factors than can of course be used here). Thus one is penalised for moving too fast, or for “wasting” velocity by zig-zagging back and forth. In such a situation, the agent should move in a straight line at constant velocity to a location with relatively low final cost but which is also reasonably close to the original position ; the global minimum for the final cost may no longer be the best place to shoot for, as it may be so far away that the transportation cost exceeds whatever cost savings one gets from the final cost.
More generally, it is natural to choose to be convex – thus, for instance, given two velocities and , should be less than or equal to the average of and . The reason for this is that one can effectively “simulate” a velocity of by zig-zagging between velocities and velocities , and so (assuming infinite maneuverability) one can always travel at an effective velocity of at a mean cost of .
To avoid some technical issues, we will assume to be strictly convex, and to make the formulae slightly nicer we will also assume that is even, thus .
One way to compute the optimal trajectory here is to solve the Euler-Lagrange equation associated to (1), with the boundary condition that the initial position is fixed. This is an ODE for which can be solved by a variety of methods. But there is another approach, based on solving a PDE rather than an ODE, and which conveys more information about the solution (such as the dependence on the initial position). The idea is to generalise the initial time from to any other time between and . More precisely, given any and , define the optimal cost at the point in spacetime to be the infimum of the cost
over all (smooth) paths starting at and with an arbitrary endpoint . Informally, this is the cost the agent would place on being at position at time .
By definition, when , the optimal cost at coincides with the final cost at , which justifies using the same symbol to denote both. The final cost can thus be viewed as a boundary condition for the optimal cost.
But what happens for times less than ? It turns out that (under some regularity hypotheses, which I will not discuss here) the optimal cost function obeys a partial differential equation, known as the Hamilton-Jacobi-Bellman equation, which we shall heuristically derive (using infinitesimals as an informal tool) as follows.
Imagine that the agent finds herself or himself at position at some time , and is deciding where to go next. Presumably there is some optimal velocity in which the agent should move in (a priori, this velocity need not be unique). So, if is an infinitesimal time, the agent should move at this velocity for time , ending up at a new position at time , and incurring a travel cost of . At this point, the optimal cost for the remainder of the agent’s journey is given by , by definition of . This leads to the heuristic formula
which on Taylor expansion (and omitting higher order terms) gives
On the other hand, is being chosen to minimise the final cost. Thus we see that should be chosen to minimise the expression
Note from the strict convexity that this minimum will be unique, and will be some function of . If we introduce the Legendre transform of by the formula
then the minimal value of is just (here we use the hypothesis that is even). We conclude that
leading to the Hamilton-Jacobi-Bellman equation
Note that this equation is being solved backwards in time, as the optimal cost is prescribed at the final time , but we are interested in its value at earlier times, and in particular when .
Once one solves this equation, one can work out the optimal velocity to travel at each location . Indeed, from the above discussion we know that is to minimise the expression
and thus maximises the expression
where . By definition, the value of this expression is equal to :
We can view as a function of . As maximises this expression for fixed , we see that
Applying the chain rule, we conclude that
Now let us make things a little more interesting by adding some random noise. Suppose that the agent’s steering mechanism is subject to a little bit of fluctuation, so that if the agent wishes to get from to in time , the agent instead ends up at , where is the infinitesimal of a standard Brownian motion in , and is a parameter measuring the noise level. With this stochastic model, the total cost is now a stochastic quantity rather than a deterministic one, and so the rational thing for the agent to do now is to minimise the expectation of the cost. Here we begin to see an advantage of the Hamilton-Jacobi approach over the Euler-Lagrange approach; while the latter approach becomes quite complicated technically in the presence of stochasticity, the former approach carries over without much difficulty. Indeed, we can define the optimal (expected) cost function much as before, as the minimal expected cost over all strategies of the agent; this is a deterministic function due to the taking of expectations. The equation (2) is now modified to
We Taylor expand this, using the Ito’s formula heuristic to obtain
Now, Brownian motion over time has zero expectation, and each of the components has a variance of (and the covariances are zero). As such, we can compute the expectation here and obtain
So the only effect of the noise is to add an additional term to the right-hand side. This term does not depend on and so does not affect the remainder of the analysis; at the end of the day, one obtains the viscous Hamilton-Jacobi-Bellman equation
for the optimal expected cost, where is the viscosity, and the optimal velocity is still given by the formula (3). This can be viewed as a nonlinear backwards heat equation, which makes sense since we are solving for this cost backwards in time. The diffusive effect of the heat equation then reflects the uncertainty of future cost caused by the random noise. (A similar diffusive effect appears in the Black-Scholes equation for pricing options, for much the same reason.)
— 2. Fokker-Planck equations —
Now suppose that instead of having just one agent, one has a huge number of agents distributed throughout space. To simplify things, let us assume that the agents all have identical motivations (in particular, they are all trying to minimise the same cost function), which implies in particular that all the agents at a given point in spacetime will all move in a given velocity (let us ignore the random noise for this initial discussion).
Rather than deal with each of the agents separately, we will pass to a continuum limit and consider just the (normalised) density function of the agents, which is a non-negative function with total mass for each time . Informally, for an infinitesimal box in space , the number of agents in that box should be approximately .
We now suppose that the velocity field is given to us, as well as the initial distribution of the agents, and ask how the distribution will evolve as time goes forward. There are several ways to find the answer, but we will take a distributional viewpoint and test the density against various test functions – smooth, compactly supported functions of space (independent of time). The integral can be viewed as the continuum limit of the sum , where is the location of the agent at time .
Let us see how this integral should vary in time. At time , the agent should move at velocity . Differentiating both sides of
using the chain rule, we thus arrive at the heuristic formula
The right-hand side, in the continuum limit , should become
which, after an integration by parts, becomes
To summarise, for every test function we have
which leads to the advection equation
Now let us reintroduce the same random noise model that we considered in the previous section. Thus, of all the agents that are infinitesimally close to at time , they will all try to move to at time , but instead each one ends up at a slightly different location , where the Brownian path is different for each agent. As such, we are led to the heuristic equation
Taylor expanding the right-hand side as before, and then passing to the continuum limit, we eventually see that the right-hand side takes the form
and then repeating the above computations leads us to the Fokker-Planck equation
— 3. Mean field games —
In the derivation of the Hamilton-Jacobi-Bellman equations above, each agent had a fixed cost function to minimise, that did not depend on the location of the other agents. A mean field game generalises this model, by allowing the cost function of each agent to also depend on the density function of all the other agents. For instance, when modeling traffic congestion, the cost function may depend not only on the velocity that one currently wishes to move at, but also on the density of traffic at that point in space and time.
There are many ways to achieve such a generalisation. The simplest would be an additive model, in which the cost function (1) is replaced with
where represents the marginal cost to an agent of having a given density at the current location. If is increasing, this intuitively means that the agent prefers to be away from the other agents (a reasonable hypothesis for traffic congestion), which should lead to a repulsive effect; conversely, a decreasing should lead to an attractive effect.
The system of (5), (6), with prescribed final data for at time , and initial data for at time , is then an example of a mean field game. The backward evolution equation (5) represents the agents’ decisions based on where they want to be in the future; and the forward evolution equation (6) represents where they actually end up, based on their initial distribution.
Solving this coupled system of equations, one evolving backwards in time, and one evolving forwards in time, is highly non-trivial, and in some cases existence or uniqueness or both break down (which suggests that the mean field approximation is not valid in this setting). However, there are various situations in which things behave well: having a small (so that the agents only plan ahead for a short period of time), having positive viscosity, and having an increasing (i.e. an aversion to crowds) all help significantly, and one typically has a good existence and uniqueness theory in such cases, based to a large extent on energy methods.
One interesting feature is that when becomes large enough (and in the attractive case when is decreasing), uniqueness can start to break down, due to the existence of standing waves that are sustained solely by nonlinear interactions between the forward evolution equation and the backward one.
The Mexican wave phenomenon can be modeled by a more complicated version of a mean field game. Here, the agents are the audience members of a stadium, and their position is modeled both by a horizontal position (which would be in a bounded two-dimensional domain), as well as an altitude (they could be sitting, standing, or be in an intermediate position). The agents cannot actually move in the direction, but only in the direction. The cost function consists of three terms: the familiar term that penalises excessive movement, a “comfort” function that penalises intermediate positions of between the sitting and standing positions (since it is not comfortable to stay in a half-standing position for any length of time), and the mean field component which penalises those agents at a position whose deviation height is too far from the height of nearby agents, in the sense that an integral such as , is large. (An audience member who is sitting when everyone is standing, or vice versa, would presumably feel uncomfortable with this status; this is analogous to the attractive case of the potential in the previous example.) It turns out that under reasonable simplifying assumptions, one can create non-trivial travelling wave solutions to this mean field game – but only if the stadium is large enough. The causal mechanism for such waves is somewhat strange, though, due to the presence of the backward propagating equation – in some sense, the wave continues to propagate because the audience members expect it to continue to propagate, and act accordingly. (One wonders if these sorts of equations could provide a model for things like asset price bubbles, which seem to be governed by a similar mechanism.)