When solving the initial value problem to an ordinary differential equation, such as

\displaystyle  \partial_t u = F(u); \quad u(0) = u_0, \ \ \ \ \ (1)

where {u: {\bf R} \rightarrow V} is the unknown solution (taking values in some finite-dimensional vector space {V}), {u_0 \in V} is the initial datum, and {F: V \rightarrow V} is some nonlinear function (which we will take to be smooth for sake of argument), then one can construct a solution locally in time via the Picard iteration method. There are two basic ideas. The first is to use the fundamental theorem of calculus to rewrite the initial value problem (1) as the problem of solving an integral equation,

\displaystyle  u(t) = u_0 + \int_0^t F(u(s))\ ds. \ \ \ \ \ (2)

The second idea is to solve this integral equation by the contraction mapping theorem, showing that the integral operator {{\mathcal N}} defined by

\displaystyle  {\mathcal N}(u) (t) := u_0 + \int_0^t F(u(s))\ ds

is a contraction on a suitable complete metric space (e.g. a closed ball in the function space {C^0([0,T]; V)}), and thus has a unique fixed point in this space. This method works as long as one only seeks to construct local solutions (for time {t} in {[0,T]} for sufficiently small {T>0}), but the solutions constructed have a number of very good properties, including

  • Existence: A solution {u} exists in the space {C^0([0,T];V)} (and even in {C^\infty([0,T];V)}) for {T} sufficiently small.
  • Uniqueness: There is at most one solution {u} to the initial value problem in the space {C^0([0,T];V)} (or in smoother spaces, such as {C^\infty([0,T];V)}). (For solutions in the weaker space {C^0([0,T];V)} we use the integral formulation (2) to define the solution concept.)
  • Lipschitz continuous dependence on the data: If {u_0^{(n)}} is a sequence of initial data converging to {u_0}, then the associated solutions {u^{(n)}} converge uniformly to {u} on {[0,T]} (possibly after shrinking {T} slightly). In fact we have the Lipschitz bound {\| u^{(n)}(t) - u(t) \|_V \leq C \| u^{(n)}_0 - u_0 \|_V} for {n} large enough and {t \in [0,T]}, where {C} is an absolute constant.

This package of properties is referred to as (Lipschitz) wellposedness.

This method extends to certain partial differential equations, particularly those of a semilinear nature (linear except for lower order nonlinear terms). For instance, if trying to solve an initial value problem of the form

\displaystyle  \partial_t u + Lu = F(u); \quad u(0,x) = u_0(x),

where now {u: {\bf R} \rightarrow V} takes values in a function space {V} (e.g. a Sobolev space {H^k({\bf R}^d)}), {u_0 \in V} is an initial datum, {L} is some (differential) operator (independent of {u}) that is (densely) defined on {V}, and {F} is a nonlinearity which is also (densely) defined on {V}, then (formally, at least) one can solve this problem by using Duhamel’s formula to convert the problem to that of solving an integral equation

\displaystyle  u(t) = e^{-tL} u_0 + \int_0^t e^{-(t-s)L} F(u(s))\ ds

and one can then hope to show that the associated nonlinear integral operator

\displaystyle  u \mapsto e^{-tL} u_0 + \int_0^t e^{-(t-s)L} F(u(s))\ ds

is a contraction in a subset of a suitably chosen function space.

This method turns out to work surprisingly well for many semilinear partial differential equations, and in particular for semilinear parabolic, semilinear dispersive, and semilinear wave equations. As in the ODE case, when the method works, it usually gives the entire package of Lipschitz well-posedness: existence, uniqueness, and Lipschitz continuous dependence on the initial data, for short times at least.

However, when one moves from semilinear initial value problems to quasilinear initial value problems such as

\displaystyle  \partial_t u + L_u u = F(u); \quad u(0,x) = u_0(x)

in which the top order operator {L_u} now depends on the solution {u} itself, then the nature of well-posedness changes; one can still hope to obtain (local) existence and uniqueness, and even continuous dependence on the data, but one usually is forced to give up Lipschitz continuous dependence at the highest available regularity (though one can often recover it at lower regularities). As a consequence, the Picard iteration method is not directly suitable for constructing solutions to such equations.

One can already see this phenomenon with a very simple equation, namely the one-dimensional constant-velocity transport equation

\displaystyle  \partial_t u + c \partial_x u = 0; \quad u(0,x) = u_0(x) \ \ \ \ \ (3)

where we consider {c = c_0} as part of the initial data. (If one wishes, one could view this equation as a rather trivial example of a system.

\displaystyle  \partial_t u + c \partial_x u = 0; \quad \partial_t c = 0

\displaystyle  u(0,x) = u_0(x); \quad c(0) = c_0,

to emphasis this viewpoint, but this would be somewhat idiosyncratic.) One can solve this equation explicitly of course to get the solution

\displaystyle  u(t,x) = u_0(x-ct).

In particular, if we look at the solution just at time {t=1} for simplicity, we have

\displaystyle  u(1,x) = u_0(x-c).

Now let us see how this solution {u(1,x)} depends on the parameter {c}. One can ask whether this dependence is Lipschitz in {c}, in some function space {V}:

\displaystyle  \| u_0(\cdot - c) - u_0(\cdot - c') \|_V \leq A |c-c'|

for some finite {A}. But using the Newton approximation

\displaystyle  u_0(\cdot - c) - u_0(\cdot - c') \approx (c-c') \partial_x u_0(\cdot - c)

we see that we should only expect such a bound when {\partial_x u_0} (and its translates) lie in {V}. Thus, we see a loss of derivatives phenomenon with regard to Lipschitz well-posedness; if the initial data {u_0} is in some regularity space, say {C^3}, then one only obtains Lipschitz dependence on {c} in a lower regularity space such as {C^2}.

We have just seen that if all one knows about the initial data {u_0} is that it is bounded in a function space {V}, then one usually cannot hope to make the dependence of {u} on the velocity parameter {c} Lipschitz continuous. Indeed, one cannot even make it continuous uniformly in {V}. Given two values of {c} that are close together, e.g. {c = 0} and {c=\epsilon}, and a reasonable function space {V} (e.g. a Sobolev space {H^k}, or a classical regularity space {C^k}) one can easily cook up a function {u_0} that is bounded in {V} but whose two solutions {u_0(\cdot)} and {u_0(\cdot-\epsilon)} separate in the {V} norm at time {1}, simply by choosing {u_0} to be supported on an interval of width {\epsilon}.

(Part of the problem here is that using a subtractive method {\|u-v\|_V} to determine the distance between two solutions {u, v} is not a physically natural operation when transport mechanisms are present that could cause the key features of {u, v} (such as singularities) to be situated in slightly different locations. In such cases, the correct notion of distance may need to take transport into account, e.g. by using metrics of Wasserstein type.)

On the other hand, one still has non-uniform continuous dependence on the initial parameters: if {u_0} lies in some reasonable function space {V}, then the map {c \mapsto u_0(\cdot-c)} is continuous in the {V} topology, even if it is not uniformly continuous with respect to {v_0}. (More succinctly: translation is a continuous but not uniformly continuous operation in most function spaces.) The reason for this is that we already have established this continuity in the case when {u_0} is so smooth that an additional derivative of {u_0} lies in {V}; and such smooth functions tend to be dense in the original space {V}, so the general case can then be established by a limiting argument, approximating a general function in {V} by a smoother function. We then see that the non-uniformity ultimately comes from the fact that a given function in {V} may be arbitrarily rough (or concentrated at an arbitrarily fine scale), and so the ability to approximate such a function by a smooth one can be arbitrarily poor.

In many quasilinear PDE, one often encounters qualitatively similar phenomena. Namely, one often has local well-posedness in sufficiently smooth function spaces {V} (so that if the initial data lies in {V}, then for short times one has existence, uniqueness, and continuous dependence on the data in the {V} topology), but Lipschitz or uniform continuity in the {V} topology is usually false. However, if the data (and solution) is known to be in a high-regularity function space {V}, one can often recover Lipschitz or uniform continuity in a lower-regularity topology.

Because the continuous dependence on the data in quasilinear equations is necessarily non-uniform, the arguments needed to establish this dependence can be remarkably delicate. As with the simple example of the transport equation, the key is to approximate a rough solution by a smooth solution first, by smoothing out the data (this is the non-uniform step, as it depends on the physical scale (or wavelength) that the data features are located). But for quasilinear equations, keeping the rough and smooth solution together can require a little juggling of function space norms, in particular playing the low-frequency nature of the smooth solution against the high-frequency nature of the residual between the rough and smooth solutions.

Below the fold I will illustrate this phenomenon with one of the simplest quasilinear equations, namely the initial value problem for the inviscid Burgers’ equation

\displaystyle  \partial_t u + u u_x = 0; \quad u(0,x) = u_0(x) \ \ \ \ \ (4)

which is a modification of the transport equation (3) in which the velocity {c} is no longer a parameter, but now depends (and is, in this case, actually equal to) the solution. To avoid technicalities we will work only with the classical function spaces {C^k} of {k} times continuously differentiable functions, though one can certainly work with other spaces (such as Sobolev spaces) by exploiting the Sobolev embedding theorem. To avoid having to distinguish continuity from uniform continuity, we shall work in a compact domain by assuming periodicity in space, thus for instance restricting {x} to the unit circle {{\bf R}/{\bf Z}}.

This discussion is inspired by this survey article of Nikolay Tzvetkov, which further explores the distinction between well-posedness and ill-posedness in both semilinear and quasilinear settings.

— 1. A priori estimates —

To avoid technicalities let us make the a priori assumption that all solutions of interest are smooth.

The Burgers equation is a pure transport equation: it moves the solution {u} around, but does not increase or decrease its values. As a consequence we obtain an a priori estimate for the {C^0} norm:

\displaystyle  \| u(t) \|_{C^0} \leq \|u(0)\|_{C^0}.

To deal with the {C^1} norm, we perform the standard trick of differentiating the equation, obtaining

\displaystyle  \partial_t u_x + u u_{xx} + u_x^2 = 0

which we rewrite as a forced transport equation

\displaystyle  (\partial_t + u \partial_x) u_x = - u_x^2.

Inspecting what this equation does at local maxima in space, one is led (formally, at least) to the differential inequality

\displaystyle  \partial_t \| u_x \|_{C^0} \leq \|u_x\|_{C^0}^2

which leads to an a priori estimate of the form

\displaystyle  \|u(t)\|_{C^1} \leq C \|u(0)\|_{C^1} \ \ \ \ \ (5)

for some absolute constant {C}, if {t} is sufficiently small depending on {\|u(0)\|_{C^1}}. More generally, the same arguments give

\displaystyle  \|u(t)\|_{C^k} \leq C_k \|u(0)\|_{C^k}

for {k=1,2,3,\ldots}, where {C_k} depends only on {k}, and {t} is sufficiently small depending on {\|u(0)\|_{C^k}}. (Actually, if one works a little more carefully, one only needs {t} sufficiently small depending on {\|u(0)\|_{C^1}}.)

The a priori estimates are not quite enough by themselves to establish local existence of solutions in the indicated function spaces, but in practice, once one has a priori estimates, one can usually work a little bit harder to then establish existence, for instance by using a compactness, viscosity, or penalty method. We will not discuss this topic here.

— 2. Lipschitz continuity at low regularity —

Now let us consider two solutions {u, v} to Burgers’ equation from two different initial data, thus

\displaystyle  \partial_t u + u u_x = 0; \quad u(0) = u_0 \ \ \ \ \ (6)


\displaystyle  \partial_t v + v v_x = 0; \quad v(0) = v_0. \ \ \ \ \ (7)

We want to say that if {u_0} and {v_0} are close in some sense, then {u} and {v} will stay close at later times. For this, the standard trick is to look at the difference {w := v-u} of the two solutions. Subtracting (6) from (7) we obtain the difference equation for {w}:

\displaystyle  \partial_t w + v w_x + w u_x = 0; \quad w(0) = w_0 := v_0 - u_0. \ \ \ \ \ (8)

We can view the evolution equation in (8) as a forced transport equation:

\displaystyle  (\partial_t + v \partial_x) w = - w u_x.

This leads to a bound for how the {C^0} norm of {w} grows:

\displaystyle  \partial_t \|w\|_{C^0} \leq \| w u_x \|_{C^0} \leq \|w\|_{C^0} \|u\|_{C^1}.

Applying Gronwall’s inequality, one obtains the a priori inequality

\displaystyle  \|w(t)\|_{C^0} \leq \|w_0\|_{C^0} \exp( \int_0^t \|u(s)\|_{C^1}\ ds )

and hence by (5) we have

\displaystyle  \| u(t) - v(t) \|_{C^0} \leq \|u_0 - v_0 \|_{C^0} \exp( C \|u_0\|_{C^1} ) \ \ \ \ \ (9)

if {t} is sufficiently small (depending on the {C^1} norm of {u_0}). Thus we see that we have Lipschitz dependence in the {C^0} topology… but only if at least one of the two solutions {u, v} already had one higher derivative of regularity, so that it was in {C^1} instead.

More generally, by using the trick of differentiating the equation, one can obtain an a priori inequality of the form

\displaystyle  \| u(t) - v(t) \|_{C^k} \leq \|u_0 - v_0 \|_{C^k} \exp( C_k \|u_0\|_{C^{k+1}} )

for some {C_k} depending only on {k}, for {t} sufficiently small depending on {\|u_0\|_{C^{k+1}}}. Once again, to get Lipschitz continuity at some regularity {C^k}, one must first assume one higher degree {C^{k+1}} of regularity on one of the solutions.

This loss of derivatives is unfortunate, but this is at least good enough to recover uniqueness: setting {u_0=v_0} in, say, (9) we obtain uniqueness of {C^1} solutions (locally in time, at least), thanks to the trivial fact that two {C^1} functions that agree in {C^0} norm automatically agree in {C^1} norm also. (One can then boost local uniqueness to global uniqueness by a continuity argument.)

— 3. Non-uniform continuity at high regularity —

Let {u^{(n)}_0} be a sequence of {C^1} data converging in the {C^1} topology to a limit {u_0 \in C^1}. As {u_0} and {u^{(n)}_0} are then uniformly bounded in {C^1}, existence theory then gives us {C^1} solutions {u^{(n)}}, {u} to the associated initial value problems

\displaystyle  \partial_t u + u u_x = 0; \quad u(0) = u_0 \ \ \ \ \ (10)


\displaystyle  \partial_t u^{(n)} + u^{(n)} u^{(n)}_x = 0; \quad u^{(n)}(0) = u^{(n)}_0 \ \ \ \ \ (11)

for all {t} in some uniform time interval {[0,T]}.

From (5) we know that the {u^{(n)}} and {u} are uniformly bounded in {C^1} norm (for {T} small enough). From the Lipschitz continuity (9) we know that {u^{(n)}} converges to {u} in {C^0} norm. But does {u^{(n)}} converge to {u} in the {C^1} norm?

The answer is yes, but the proof is remarkably delicate. A direct attempt to control the difference between {u^{(n)}} and {u} in {C^1}, following the lines of the previous argument, requires something to be bounded in {C^2}. But we only have {u^{(n)}} and {u} bounded in {C^1}.

However, note that in the arguments of the previous section, we don’t need both solutions to be in {C^2}; it’s enough for just one solution to be in {C^2}. Now, while neither {u^{(n)}} nor {u} are bounded in {C^2} yet, what we can do is to introduce a third solution {v}, which is regularised to lie in {C^2} and not just in {C^1}, while still being initially close to {u_0} and hence to {u^{(n)}_0} in {C^1} norm. The hope is then to show that {u} and {u^{(n)}} are both close to {v} in {C^1}, which by the triangle inequality will make {u} and {u^{(n)}} close to each other.

Unfortunately, in order to get the regularised solution {v} close to {u_0} initially, the {C^2} norm of {v(0)} (and hence of {v}) may have to be quite large. But we can compensate for this by making the {C^0} distance between {v(0)} and {u_0} quite small. The two effects turn out to basically cancel each other and allow one to proceed.

Let’s see how this is done. (The argument here originates from this paper of Bona and Smith.) Consider a solution {v} which is initially close to {u_0} in {C^1} norm (and very close in {C^0} norm), and also has finite (but potentially large) {C^2} norm; we will quantify these statements more precisely later.

Once again, we set {w = v-u} and {w_0 = v_0-u_0}, giving a difference equation which we now write as

\displaystyle  \partial_t w + u w_x + w v_x = 0; \quad w(0) = w_0 \ \ \ \ \ (12)

in order to take advantage of the higher regularity of {v}. For the {C^0} norm, we have

\displaystyle  \|w(t)\|_{C^0} = O( \|w_0\|_{C^0} ) \ \ \ \ \ (13)

for {t} sufficiently small, thanks (9) and the uniform {C^1} bounds. For the {C^1} norm, we first differentiate (12) to obtain

\displaystyle  (\partial_t + u \partial_x) w_x = - u_x w_x - w_x v_x - w v_{xx}

and thus

\displaystyle  \partial_t \| w_x \|_{C^0} \leq \| u_x w_x \|_{C^0} + \| w_x v_x\|_{C^0} + \| w v_{xx} \|_{C^0}.

The first two terms on the RHS are {O( \|w_x\|_{C^0} )} thanks to the uniform {C^1} bounds. The third term is {O( \|w_0\|_{C^0} \|v_0\|_{C^2} )} by (13) and a priori {C^2} estimates (here we use the fact that the time of existence for {C^2} bounds can be controlled by the {C^1} norm). Using Gronwall’s inequality, we conclude that

\displaystyle  \| w_x(t) \|_{C^0} \ll \|\partial_x w_0\|_{C^0} + \|w_0\|_{C^0} \|v_0 \|_{C^2}

and thus

\displaystyle  \|v(t)-u(t)\|_{C^1} \ll \| v_0 - u_0 \|_{C^1} + \| v_0-u_0\|_{C^0} \| v_0 \|_{C^2}.

Similarly one has

\displaystyle  \|v(t)-u^{(n)}(t)\|_{C^1} \ll \| v_0 - u^{(n)}_0 \|_{C^1} + \| v_0-u^{(n)}_0\|_{C^0} \| v_0 \|_{C^2},

and so by the triangle inequality we have

\displaystyle  \|u^{(n)}(t)-u(t)\|_{C^1} \ll \| v_0 - u_0 \|_{C^1} + \| v_0-u_0\|_{C^0} \| v_0 \|_{C^2} \ \ \ \ \ (14)

for {n} sufficiently large.

Note how the {C^2} norm in the second term is balanced by the {C^0} norm. We can exploit this balance as follows. Let {\epsilon > 0} be a small quantity, and let {v_0 := u_0 * P_\epsilon}, where {P_\epsilon = \frac{1}{\epsilon} P( \frac{x}{\epsilon})} is a suitable approximation to the identity. A little bit of integration by parts using the {C^1} bound on {u_0} then gives the bounds

\displaystyle  \|v_0-u_0 \|_{C^0} \ll \epsilon


\displaystyle  \|v_0-u_0 \|_{C^1} \ll 1


\displaystyle  \|v_0\|_{C^2} \ll \frac{1}{\epsilon}.

This is not quite enough to get anything useful out of (14). But to do better, we can use the fact that {\partial_x u_0}, being uniformly continuous, has some modulus of continuity, thus one has

\displaystyle  \| \partial_x u_0(\cdot+t) - \partial_x u_0(\cdot) \|_{C^0} = o(1)

as {t \rightarrow 0}. Using this, one can soon get the improved estimates

\displaystyle  \|v_0-u_0 \|_{C^0} = o(\epsilon)


\displaystyle  \|v_0-u_0 \|_{C^1} = o(1)

as {\epsilon \rightarrow 0}. Applying (14), we thus see that

\displaystyle  \|u^{(n)}(t)-u(t)\|_{C^1} \ll o(1)

for {n} sufficiently large, and the continuity claim follows.