You are currently browsing the tag archive for the ‘Burgers’ equation’ tag.

When solving the initial value problem to an ordinary differential equation, such as

\displaystyle  \partial_t u = F(u); \quad u(0) = u_0, \ \ \ \ \ (1)

where {u: {\bf R} \rightarrow V} is the unknown solution (taking values in some finite-dimensional vector space {V}), {u_0 \in V} is the initial datum, and {F: V \rightarrow V} is some nonlinear function (which we will take to be smooth for sake of argument), then one can construct a solution locally in time via the Picard iteration method. There are two basic ideas. The first is to use the fundamental theorem of calculus to rewrite the initial value problem (1) as the problem of solving an integral equation,

\displaystyle  u(t) = u_0 + \int_0^t F(u(s))\ ds. \ \ \ \ \ (2)

The second idea is to solve this integral equation by the contraction mapping theorem, showing that the integral operator {{\mathcal N}} defined by

\displaystyle  {\mathcal N}(u) (t) := u_0 + \int_0^t F(u(s))\ ds

is a contraction on a suitable complete metric space (e.g. a closed ball in the function space {C^0([0,T]; V)}), and thus has a unique fixed point in this space. This method works as long as one only seeks to construct local solutions (for time {t} in {[0,T]} for sufficiently small {T>0}), but the solutions constructed have a number of very good properties, including

  • Existence: A solution {u} exists in the space {C^0([0,T];V)} (and even in {C^\infty([0,T];V)}) for {T} sufficiently small.
  • Uniqueness: There is at most one solution {u} to the initial value problem in the space {C^0([0,T];V)} (or in smoother spaces, such as {C^\infty([0,T];V)}). (For solutions in the weaker space {C^0([0,T];V)} we use the integral formulation (2) to define the solution concept.)
  • Lipschitz continuous dependence on the data: If {u_0^{(n)}} is a sequence of initial data converging to {u_0}, then the associated solutions {u^{(n)}} converge uniformly to {u} on {[0,T]} (possibly after shrinking {T} slightly). In fact we have the Lipschitz bound {\| u^{(n)}(t) - u(t) \|_V \leq C \| u^{(n)}_0 - u_0 \|_V} for {n} large enough and {t \in [0,T]}, where {C} is an absolute constant.

This package of properties is referred to as (Lipschitz) wellposedness.

This method extends to certain partial differential equations, particularly those of a semilinear nature (linear except for lower order nonlinear terms). For instance, if trying to solve an initial value problem of the form

\displaystyle  \partial_t u + Lu = F(u); \quad u(0,x) = u_0(x),

where now {u: {\bf R} \rightarrow V} takes values in a function space {V} (e.g. a Sobolev space {H^k({\bf R}^d)}), {u_0 \in V} is an initial datum, {L} is some (differential) operator (independent of {u}) that is (densely) defined on {V}, and {F} is a nonlinearity which is also (densely) defined on {V}, then (formally, at least) one can solve this problem by using Duhamel’s formula to convert the problem to that of solving an integral equation

\displaystyle  u(t) = e^{-tL} u_0 + \int_0^t e^{-(t-s)L} F(u(s))\ ds

and one can then hope to show that the associated nonlinear integral operator

\displaystyle  u \mapsto e^{-tL} u_0 + \int_0^t e^{-(t-s)L} F(u(s))\ ds

is a contraction in a subset of a suitably chosen function space.

This method turns out to work surprisingly well for many semilinear partial differential equations, and in particular for semilinear parabolic, semilinear dispersive, and semilinear wave equations. As in the ODE case, when the method works, it usually gives the entire package of Lipschitz well-posedness: existence, uniqueness, and Lipschitz continuous dependence on the initial data, for short times at least.

However, when one moves from semilinear initial value problems to quasilinear initial value problems such as

\displaystyle  \partial_t u + L_u u = F(u); \quad u(0,x) = u_0(x)

in which the top order operator {L_u} now depends on the solution {u} itself, then the nature of well-posedness changes; one can still hope to obtain (local) existence and uniqueness, and even continuous dependence on the data, but one usually is forced to give up Lipschitz continuous dependence at the highest available regularity (though one can often recover it at lower regularities). As a consequence, the Picard iteration method is not directly suitable for constructing solutions to such equations.

One can already see this phenomenon with a very simple equation, namely the one-dimensional constant-velocity transport equation

\displaystyle  \partial_t u + c \partial_x u = 0; \quad u(0,x) = u_0(x) \ \ \ \ \ (3)

where we consider {c = c_0} as part of the initial data. (If one wishes, one could view this equation as a rather trivial example of a system.

\displaystyle  \partial_t u + c \partial_x u = 0; \quad \partial_t c = 0

\displaystyle  u(0,x) = u_0(x); \quad c(0) = c_0,

to emphasis this viewpoint, but this would be somewhat idiosyncratic.) One can solve this equation explicitly of course to get the solution

\displaystyle  u(t,x) = u_0(x-ct).

In particular, if we look at the solution just at time {t=1} for simplicity, we have

\displaystyle  u(1,x) = u_0(x-c).

Now let us see how this solution {u(1,x)} depends on the parameter {c}. One can ask whether this dependence is Lipschitz in {c}, in some function space {V}:

\displaystyle  \| u_0(\cdot - c) - u_0(\cdot - c') \|_V \leq A |c-c'|

for some finite {A}. But using the Newton approximation

\displaystyle  u_0(\cdot - c) - u_0(\cdot - c') \approx (c-c') \partial_x u_0(\cdot - c)

we see that we should only expect such a bound when {\partial_x u_0} (and its translates) lie in {V}. Thus, we see a loss of derivatives phenomenon with regard to Lipschitz well-posedness; if the initial data {u_0} is in some regularity space, say {C^3}, then one only obtains Lipschitz dependence on {c} in a lower regularity space such as {C^2}.

We have just seen that if all one knows about the initial data {u_0} is that it is bounded in a function space {V}, then one usually cannot hope to make the dependence of {u} on the velocity parameter {c} Lipschitz continuous. Indeed, one cannot even make it continuous uniformly in {V}. Given two values of {c} that are close together, e.g. {c = 0} and {c=\epsilon}, and a reasonable function space {V} (e.g. a Sobolev space {H^k}, or a classical regularity space {C^k}) one can easily cook up a function {u_0} that is bounded in {V} but whose two solutions {u_0(\cdot)} and {u_0(\cdot-\epsilon)} separate in the {V} norm at time {1}, simply by choosing {u_0} to be supported on an interval of width {\epsilon}.

(Part of the problem here is that using a subtractive method {\|u-v\|_V} to determine the distance between two solutions {u, v} is not a physically natural operation when transport mechanisms are present that could cause the key features of {u, v} (such as singularities) to be situated in slightly different locations. In such cases, the correct notion of distance may need to take transport into account, e.g. by using metrics of Wasserstein type.)

On the other hand, one still has non-uniform continuous dependence on the initial parameters: if {u_0} lies in some reasonable function space {V}, then the map {c \mapsto u_0(\cdot-c)} is continuous in the {V} topology, even if it is not uniformly continuous with respect to {v_0}. (More succinctly: translation is a continuous but not uniformly continuous operation in most function spaces.) The reason for this is that we already have established this continuity in the case when {u_0} is so smooth that an additional derivative of {u_0} lies in {V}; and such smooth functions tend to be dense in the original space {V}, so the general case can then be established by a limiting argument, approximating a general function in {V} by a smoother function. We then see that the non-uniformity ultimately comes from the fact that a given function in {V} may be arbitrarily rough (or concentrated at an arbitrarily fine scale), and so the ability to approximate such a function by a smooth one can be arbitrarily poor.

In many quasilinear PDE, one often encounters qualitatively similar phenomena. Namely, one often has local well-posedness in sufficiently smooth function spaces {V} (so that if the initial data lies in {V}, then for short times one has existence, uniqueness, and continuous dependence on the data in the {V} topology), but Lipschitz or uniform continuity in the {V} topology is usually false. However, if the data (and solution) is known to be in a high-regularity function space {V}, one can often recover Lipschitz or uniform continuity in a lower-regularity topology.

Because the continuous dependence on the data in quasilinear equations is necessarily non-uniform, the arguments needed to establish this dependence can be remarkably delicate. As with the simple example of the transport equation, the key is to approximate a rough solution by a smooth solution first, by smoothing out the data (this is the non-uniform step, as it depends on the physical scale (or wavelength) that the data features are located). But for quasilinear equations, keeping the rough and smooth solution together can require a little juggling of function space norms, in particular playing the low-frequency nature of the smooth solution against the high-frequency nature of the residual between the rough and smooth solutions.

Below the fold I will illustrate this phenomenon with one of the simplest quasilinear equations, namely the initial value problem for the inviscid Burgers’ equation

\displaystyle  \partial_t u + u u_x = 0; \quad u(0,x) = u_0(x) \ \ \ \ \ (4)

which is a modification of the transport equation (3) in which the velocity {c} is no longer a parameter, but now depends (and is, in this case, actually equal to) the solution. To avoid technicalities we will work only with the classical function spaces {C^k} of {k} times continuously differentiable functions, though one can certainly work with other spaces (such as Sobolev spaces) by exploiting the Sobolev embedding theorem. To avoid having to distinguish continuity from uniform continuity, we shall work in a compact domain by assuming periodicity in space, thus for instance restricting {x} to the unit circle {{\bf R}/{\bf Z}}.

This discussion is inspired by this survey article of Nikolay Tzvetkov, which further explores the distinction between well-posedness and ill-posedness in both semilinear and quasilinear settings.

Read the rest of this entry »

We can now turn attention to one of the centerpiece universality results in random matrix theory, namely the Wigner semi-circle law for Wigner matrices. Recall from previous notes that a Wigner Hermitian matrix ensemble is a random matrix ensemble {M_n = (\xi_{ij})_{1 \leq i,j \leq n}} of Hermitian matrices (thus {\xi_{ij} = \overline{\xi_{ji}}}; this includes real symmetric matrices as an important special case), in which the upper-triangular entries {\xi_{ij}}, {i>j} are iid complex random variables with mean zero and unit variance, and the diagonal entries {\xi_{ii}} are iid real variables, independent of the upper-triangular entries, with bounded mean and variance. Particular special cases of interest include the Gaussian Orthogonal Ensemble (GOE), the symmetric random sign matrices (aka symmetric Bernoulli ensemble), and the Gaussian Unitary Ensemble (GUE).

In previous notes we saw that the operator norm of {M_n} was typically of size {O(\sqrt{n})}, so it is natural to work with the normalised matrix {\frac{1}{\sqrt{n}} M_n}. Accordingly, given any {n \times n} Hermitian matrix {M_n}, we can form the (normalised) empirical spectral distribution (or ESD for short)

\displaystyle  \mu_{\frac{1}{\sqrt{n}} M_n} := \frac{1}{n} \sum_{j=1}^n \delta_{\lambda_j(M_n) / \sqrt{n}},

of {M_n}, where {\lambda_1(M_n) \leq \ldots \leq \lambda_n(M_n)} are the (necessarily real) eigenvalues of {M_n}, counting multiplicity. The ESD is a probability measure, which can be viewed as a distribution of the normalised eigenvalues of {M_n}.

When {M_n} is a random matrix ensemble, then the ESD {\mu_{\frac{1}{\sqrt{n}} M_n}} is now a random measure – i.e. a random variable taking values in the space {\hbox{Pr}({\mathbb R})} of probability measures on the real line. (Thus, the distribution of {\mu_{\frac{1}{\sqrt{n}} M_n}} is a probability measure on probability measures!)

Now we consider the behaviour of the ESD of a sequence of Hermitian matrix ensembles {M_n} as {n \rightarrow \infty}. Recall from Notes 0 that for any sequence of random variables in a {\sigma}-compact metrisable space, one can define notions of convergence in probability and convergence almost surely. Specialising these definitions to the case of random probability measures on {{\mathbb R}}, and to deterministic limits, we see that a sequence of random ESDs {\mu_{\frac{1}{\sqrt{n}} M_n}} converge in probability (resp. converge almost surely) to a deterministic limit {\mu \in \hbox{Pr}({\mathbb R})} (which, confusingly enough, is a deterministic probability measure!) if, for every test function {\varphi \in C_c({\mathbb R})}, the quantities {\int_{\mathbb R} \varphi\ d\mu_{\frac{1}{\sqrt{n}} M_n}} converge in probability (resp. converge almost surely) to {\int_{\mathbb R} \varphi\ d\mu}.

Remark 1 As usual, convergence almost surely implies convergence in probability, but not vice versa. In the special case of random probability measures, there is an even weaker notion of convergence, namely convergence in expectation, defined as follows. Given a random ESD {\mu_{\frac{1}{\sqrt{n}} M_n}}, one can form its expectation {{\bf E} \mu_{\frac{1}{\sqrt{n}} M_n} \in \hbox{Pr}({\mathbb R})}, defined via duality (the Riesz representation theorem) as

\displaystyle  \int_{\mathbb R} \varphi\ d{\bf E} \mu_{\frac{1}{\sqrt{n}} M_n} := {\bf E} \int_{\mathbb R} \varphi\ d	 \mu_{\frac{1}{\sqrt{n}} M_n};

this probability measure can be viewed as the law of a random eigenvalue {\frac{1}{\sqrt{n}}\lambda_i(M_n)} drawn from a random matrix {M_n} from the ensemble. We then say that the ESDs converge in expectation to a limit {\mu \in \hbox{Pr}({\mathbb R})} if {{\bf E} \mu_{\frac{1}{\sqrt{n}} M_n}} converges the vague topology to {\mu}, thus

\displaystyle  {\bf E} \int_{\mathbb R} \varphi\ d	 \mu_{\frac{1}{\sqrt{n}} M_n} \rightarrow \int_{\mathbb R} \varphi\ d\mu

for all {\phi \in C_c({\mathbb R})}.

In general, these notions of convergence are distinct from each other; but in practice, one often finds in random matrix theory that these notions are effectively equivalent to each other, thanks to the concentration of measure phenomenon.

Exercise 1 Let {M_n} be a sequence of {n \times n} Hermitian matrix ensembles, and let {\mu} be a continuous probability measure on {{\mathbb R}}.

  • Show that {\mu_{\frac{1}{\sqrt{n}} M_n}} converges almost surely to {\mu} if and only if {\mu_{\frac{1}{\sqrt{n}}}(-\infty,\lambda)} converges almost surely to {\mu(-\infty,\lambda)} for all {\lambda \in {\mathbb R}}.
  • Show that {\mu_{\frac{1}{\sqrt{n}} M_n}} converges in probability to {\mu} if and only if {\mu_{\frac{1}{\sqrt{n}}}(-\infty,\lambda)} converges in probability to {\mu(-\infty,\lambda)} for all {\lambda \in {\mathbb R}}.
  • Show that {\mu_{\frac{1}{\sqrt{n}} M_n}} converges in expectation to {\mu} if and only if {\mathop{\mathbb E} \mu_{\frac{1}{\sqrt{n}}}(-\infty,\lambda)} converges to {\mu(-\infty,\lambda)} for all {\lambda \in {\mathbb R}}.

We can now state the Wigner semi-circular law.

Theorem 1 (Semicircular law) Let {M_n} be the top left {n \times n} minors of an infinite Wigner matrix {(\xi_{ij})_{i,j \geq 1}}. Then the ESDs {\mu_{\frac{1}{\sqrt{n}} M_n}} converge almost surely (and hence also in probability and in expectation) to the Wigner semi-circular distribution

\displaystyle  \mu_{sc} := \frac{1}{2\pi} (4-|x|^2)^{1/2}_+\ dx. \ \ \ \ \ (1)

A numerical example of this theorem in action can be seen at the MathWorld entry for this law.

The semi-circular law nicely complements the upper Bai-Yin theorem from Notes 3, which asserts that (in the case when the entries have finite fourth moment, at least), the matrices {\frac{1}{\sqrt{n}} M_n} almost surely has operator norm at most {2+o(1)}. Note that the operator norm is the same thing as the largest magnitude of the eigenvalues. Because the semi-circular distribution (1) is supported on the interval {[-2,2]} with positive density on the interior of this interval, Theorem 1 easily supplies the lower Bai-Yin theorem, that the operator norm of {\frac{1}{\sqrt{n}} M_n} is almost surely at least {2-o(1)}, and thus (in the finite fourth moment case) the norm is in fact equal to {2+o(1)}. Indeed, we have just shown that the circular law provides an alternate proof of the lower Bai-Yin bound (Proposition 11 of Notes 3).

As will hopefully become clearer in the next set of notes, the semi-circular law is the noncommutative (or free probability) analogue of the central limit theorem, with the semi-circular distribution (1) taking on the role of the normal distribution. Of course, there is a striking difference between the two distributions, in that the former is compactly supported while the latter is merely subgaussian. One reason for this is that the concentration of measure phenomenon is more powerful in the case of ESDs of Wigner matrices than it is for averages of iid variables; compare the concentration of measure results in Notes 3 with those in Notes 1.

There are several ways to prove (or at least to heuristically justify) the circular law. In this set of notes we shall focus on the two most popular methods, the moment method and the Stieltjes transform method, together with a third (heuristic) method based on Dyson Brownian motion (Notes 3b). In the next set of notes we shall also study the free probability method, and in the set of notes after that we use the determinantal processes method (although this method is initially only restricted to highly symmetric ensembles, such as GUE).

Read the rest of this entry »