The inverse function theorem for everywhere differentiable maps

12 September, 2011 in expository, math.CA, math.GN | Tags: inverse function theorem, Jean Saint Raymond, math overflow | by Terence Tao

The classical inverse function theorem reads as follows:

Theorem 1 ( ${C^1}$ inverse function theorem) Let ${\Omega \subset {\bf R}^n}$ be an open set, and let ${f: \Omega \rightarrow {\bf R}^n}$ be an continuously differentiable function, such that for every ${x_0 \in \Omega}$ , the derivative map ${Df(x_0): {\bf R}^n \rightarrow {\bf R}^n}$ is invertible. Then ${f}$ is a local homeomorphism; thus, for every ${x_0 \in \Omega}$ , there exists an open neighbourhood ${U}$ of ${x_0}$ and an open neighbourhood ${V}$ of ${f(x_0)}$ such that ${f}$ is a homeomorphism from ${U}$ to ${V}$ .

It is also not difficult to show by inverting the Taylor expansion

$\displaystyle f(x) = f(x_0) + Df(x_0)(x-x_0) + o(\|x-x_0\|)$

that at each ${x_0}$ , the local inverses ${f^{-1}: V \rightarrow U}$ are also differentiable at ${f(x_0)}$ with derivative

$\displaystyle Df^{-1}(f(x_0)) = Df(x_0)^{-1}. \ \ \ \ \ (1)$

The textbook proof of the inverse function theorem proceeds by an application of the contraction mapping theorem. Indeed, one may normalise ${x_0=f(x_0)=0}$ and ${Df(0)}$ to be the identity map; continuity of ${Df}$ then shows that ${Df(x)}$ is close to the identity for small ${x}$ , which may be used (in conjunction with the fundamental theorem of calculus) to make ${x \mapsto x-f(x)+y}$ a contraction on a small ball around the origin for small ${y}$ , at which point the contraction mapping theorem readily finishes off the problem.

I recently learned (after I asked this question on Math Overflow) that the hypothesis of continuous differentiability may be relaxed to just everywhere differentiability:

Theorem 2 (Everywhere differentiable inverse function theorem) Let ${\Omega \subset {\bf R}^n}$ be an open set, and let ${f: \Omega \rightarrow {\bf R}^n}$ be an everywhere differentiable function, such that for every ${x_0 \in \Omega}$ , the derivative map ${Df(x_0): {\bf R}^n \rightarrow {\bf R}^n}$ is invertible. Then ${f}$ is a local homeomorphism; thus, for every ${x_0 \in \Omega}$ , there exists an open neighbourhood ${U}$ of ${x_0}$ and an open neighbourhood ${V}$ of ${f(x_0)}$ such that ${f}$ is a homeomorphism from ${U}$ to ${V}$ .

As before, one can recover the differentiability of the local inverses, with the derivative of the inverse given by the usual formula (1).

This result implicitly follows from the more general results of Cernavskii about the structure of finite-to-one open and closed maps, however the arguments there are somewhat complicated (and subsequent proofs of those results, such as the one by Vaisala, use some powerful tools from algebraic geometry, such as dimension theory). There is however a more elementary proof of Saint Raymond that was pointed out to me by Julien Melleray. It only uses basic point-set topology (for instance, the concept of a connected component) and the basic topological and geometric structure of Euclidean space (in particular relying primarily on local compactness, local connectedness, and local convexity). I decided to present (an arrangement of) Saint Raymond’s proof here.

To obtain a local homeomorphism near ${x_0}$ , there are basically two things to show: local surjectivity near ${x_0}$ (thus, for ${y}$ near ${f(x_0)}$ , one can solve ${f(x)=y}$ for some ${x}$ near ${x_0}$ ) and local injectivity near ${x_0}$ (thus, for distinct ${x_1, x_2}$ near ${f(x_0)}$ , ${f(x_1)}$ is not equal to ${f(x_2)}$ ). Local surjectivity is relatively easy; basically, the standard proof of the inverse function theorem works here, after replacing the contraction mapping theorem (which is no longer available due to the possibly discontinuous nature of ${Df}$ ) with the Brouwer fixed point theorem instead (or one could also use degree theory, which is more or less an equivalent approach). The difficulty is local injectivity – one needs to preclude the existence of nearby points ${x_1, x_2}$ with ${f(x_1) = f(x_2) = y}$ ; note that in contrast to the contraction mapping theorem that provides both existence and uniqueness of fixed points, the Brouwer fixed point theorem only gives existence and not uniqueness.

In one dimension ${n=1}$ one can proceed by using Rolle’s theorem. Indeed, as one traverses the interval from ${x_1}$ to ${x_2}$ , one must encounter some intermediate point ${x_*}$ which maximises the quantity ${|f(x_*)-y|}$ , and which is thus instantaneously non-increasing both to the left and to the right of ${x_*}$ . But, by hypothesis, ${f'(x_*)}$ is non-zero, and this easily leads to a contradiction.

Saint Raymond’s argument for the higher dimensional case proceeds in a broadly similar way. Starting with two nearby points ${x_1, x_2}$ with ${f(x_1)=f(x_2)=y}$ , one finds a point ${x_*}$ which “locally extremises” ${\|f(x_*)-y\|}$ in the following sense: ${\|f(x_*)-y\|}$ is equal to some ${r_*>0}$ , but ${x_*}$ is adherent to at least two distinct connected components ${U_1, U_2}$ of the set ${U = \{ x: \|f(x)-y\| < r_* \}}$ . (This is an oversimplification, as one has to restrict the available points ${x}$ in ${U}$ to a suitably small compact set, but let us ignore this technicality for now.) Note from the non-degenerate nature of ${Df(x_*)}$ that ${x_*}$ was already adherent to ${U}$ ; the point is that ${x_*}$ “disconnects” ${U}$ in some sense. Very roughly speaking, the way such a critical point ${x_*}$ is found is to look at the sets ${\{ x: \|f(x)-y\| \leq r \}}$ as ${r}$ shrinks from a large initial value down to zero, and one finds the first value of ${r_*}$ below which this set disconnects ${x_1}$ from ${x_2}$ . (Morally, one is performing some sort of Morse theory here on the function ${x \mapsto \|f(x)-y\|}$ , though this function does not have anywhere near enough regularity for classical Morse theory to apply.)

The point ${x_*}$ is mapped to a point ${f(x_*)}$ on the boundary ${\partial B(y,r_*)}$ of the ball ${B(y,r_*)}$ , while the components ${U_1, U_2}$ are mapped to the interior of this ball. By using a continuity argument, one can show (again very roughly speaking) that ${f(U_1)}$ must contain a “hemispherical” neighbourhood ${\{ z \in B(y,r_*): \|z-f(x_*)\| < \kappa \}}$ of ${f(x_*)}$ inside ${B(y,r_*)}$ , and similarly for ${f(U_2)}$ . But then from differentiability of ${f}$ at ${x_*}$ , one can then show that ${U_1}$ and ${U_2}$ overlap near ${x_*}$ , giving a contradiction.

The rigorous details of the proof are provided below the fold.

— 1. Proof —

Fix ${x_0 \in \Omega}$ . By a translation, we may assume ${x_0=f(x_0)=0}$ ; by a further linear change of variables, we may also assume ${Df(0)}$ (which by hypothesis is non-singular) to be the identity map. By differentiability, we have

$\displaystyle f(x) = x + o(\|x\|)$

as ${x \rightarrow 0}$ . In particular, there exists a ball ${B(0,r_0)}$ in ${\Omega}$ such that

$\displaystyle \| f(x)-x\| < \frac{1}{2} \|x\|$

for all ${x\in B(0,r_0)}$ ; by rescaling we may take ${r_0=1}$ , thus

$\displaystyle \| f(x)-x\| < \frac{1}{2} \|x\| \hbox{ whenever } \|x\| \leq 1. \ \ \ \ \ (2)$

Among other things, this gives a uniform lower bound

$\displaystyle \| f(x) \| > \frac{1}{2} \ \ \ \ \ (3)$

for all ${x \in \partial B(0, 1)}$ , and a uniform upper bound

$\displaystyle \| f(x) \| < \frac{1}{10} \ \ \ \ \ (4)$

for all ${x \in \partial B(0, \frac{1}{20})}$ ; thus ${f}$ maps ${B(0,\frac{1}{20})}$ to ${B(0,\frac{1}{10} )}$ .

Proposition 3 (Local surjectivity) For any ${0 < r < 1}$ , ${f(B(0,r))}$ contains ${B(0,r/2)}$ .

Proof: Let ${y \in B(0,r/2)}$ . From (2), we see that the map ${f: \partial B(0,r) \rightarrow f(\partial B(0,r))}$ avoids ${y}$ , and has degree ${1}$ around ${y}$ ; contracting ${\partial B(0,r)}$ to a point, we conclude that ${f(x)=y}$ for some ${x \in B(0,r)}$ , yielding the claim.

Alternatively, one may proceed by invoking the Brouwer fixed point theorem, noting that the map ${x \mapsto x - f(x) + y}$ is continuous and maps the closed ball ${\overline{B(0,r)}}$ to the open ball ${B(0,r)}$ by (2), and has a fixed point precisely when ${f(x)=y}$ .

A third argument (avoiding the use of degree theory or the Brouwer fixed point theorem, but requiring one to replace ${B(0,r/2)}$ with the slightly smaller ball ${B(0,r/3)}$ ) is as follows: let ${x \in \overline{B(0,r)}}$ minimise ${\|f(x)-y\|}$ . From (2) and the hypothesis ${y \in B(0,r/3)}$ we see that ${x}$ lies in the interior ${B(0,r)}$ . If the minimum is zero, then we have found a solution to ${f(x)=y}$ as required; if not, then we have a stationary point of ${x \mapsto \|f(x)-y\|}$ , which implies that ${Df(x)}$ is degenerate, a contradiction. (One can recover the full ball ${B(0,r/2)}$ by tweaking the expression ${\|f(x)-y\|}$ to be minimised in a suitable fashion; we leave this as an exercise for the interested reader.) $\Box$

Corollary 4 ${f}$ is an open map: the image of any open set is open.

Proof: It suffices to show that for every ${x \in \Omega}$ , the image of any open neighbourhood of ${x}$ is an open neighbourhood of ${f(x)}$ . Proposition 3 handles the case ${x=0}$ ; the general case follows by renormalising. $\Box$

Suppose we could show that ${f}$ is injective on ${B(0,\frac{1}{20})}$ . By Corollary 4, the inverse map ${f^{-1}: f(B(0,\frac{1}{20})) \rightarrow B(0,\frac{1}{20})}$ is also continuous. Thus ${f}$ is a homeomorphism from ${B(0,\frac{1}{20})}$ to ${f(B(0,\frac{1}{20}))}$ , which are both neighbourhoods of ${0}$ by Proposition 3; giving the claim.

It remains to establish injectivity. Suppose for sake of contradiction that this was not the case. Then there exists ${x_1, x_2 \in B(0,\frac{1}{20} )}$ and ${y \in B(0,\frac{1}{10} )}$ such that

$\displaystyle y = f(x_1) = f(x_2).$

For every radius ${r \geq 0}$ , the set

$\displaystyle K_r := \{ x \in \Omega: \| f(x)-y\| \leq r \}$

is closed and contains both ${x_1}$ and ${x_2}$ . Let ${K_r^1}$ denote the connected component of ${K_r}$ that contains ${x_1}$ . Since ${K_r}$ is non-decreasing in ${r}$ , ${K_r^1}$ is non-decreasing also.

Now let us study the behaviour of ${K_r^1}$ as ${r}$ ranges from ${0}$ to ${\frac{4}{10} }$ . The two extreme cases are easy to analyse:

Lemma 5 ${K_0^1 = \{x_1\}}$ .

Proof: Since ${Df(x_1)}$ is non-singular, we see from differentiability that ${f(x) \neq f(x_1)}$ for all ${x \neq x_1}$ sufficiently close to ${x_1}$ . Thus ${x_1}$ is an isolated point of ${K_0}$ , and the claim follows. $\Box$

Lemma 6 We have ${B(0,\frac{1}{20} ) \subset K_r^1 \subset B(0,1)}$ for all ${\frac{2}{10} \leq r \leq \frac{4}{10}}$ . In particular, ${K_r^1}$ is compact for all ${0 \leq r \leq \frac{4}{10} }$ , and contains ${x_2}$ for ${\frac{2}{10} \leq r \leq \frac{4}{10} }$ .

Proof: Since ${f(B(0,\frac{1}{20} )) \subset B(f(0),\frac{1}{10} ) \subset \overline{B(y,r)}}$ , we see that ${B(0,\frac{1}{20} ) \subset K_r}$ ; since ${B(0,\frac{1}{20} )}$ is connected and contains ${x_1}$ , we conclude that ${B(0,\frac{1}{20} ) \subset K^1_r}$ .

Next, if ${x \in \partial B(0,1)}$ , then by (3) we have ${f(x) \not \in B(0,\frac{1}{2})}$ , and hence ${f(x) \not \in \overline{B(y, r)}}$ . Thus ${K_r}$ is disjoint from the sphere ${\partial B(0,1)}$ . Since ${x_1}$ lies in the interior of this sphere we thus have ${K_r^1 \subset B(0,1)}$ as required. $\Box$

Next, we show that the ${K_r^1}$ increase continuously in ${r}$ :

Lemma 7 If ${0 \leq r < \frac{1}{20}}$ and ${\epsilon > 0}$ , then for ${r < r' < \frac{1}{20}}$ sufficiently close to ${r}$ , ${K_{r'}^1}$ is contained in an ${\epsilon}$ -neighbourhood of ${K_r^1}$ .

Proof: By the finite intersection property, it suffices to show that ${\bigcap_{r'>r} K_{r'}^1 = K_r^1}$ . Suppose for contradiction that there is a point ${x}$ outside of ${K_r^1}$ that lies in ${K_{r'}^1}$ for all ${r'>r}$ . Then ${x}$ lies in ${K_{r'}}$ for all ${r'>r}$ , and hence lies in ${K_r \cap B(0,1)}$ . As ${x}$ and ${x_1}$ lie in different connected components of the compact set ${K_r \cap \overline{B(0,1)}}$ (recall that ${K_r}$ is disjoint from ${\partial B(0,1)}$ ), there must be a partition of ${K_r \cap \overline{B(0,1)}}$ into two disjoint closed sets ${F, G}$ that separate ${x}$ from ${x_1}$ (for otherwise the only clopen sets in ${K_r \cap \overline{B(0,1)}}$ that contain ${x_1}$ would also contain ${x}$ , and their intersection would then be a connected subset of ${K_r \cap \overline{B(0,1)}}$ that contains both ${x_1}$ and ${x}$ , contradicting the fact that ${x}$ lies outside ${K_r^1}$ ). By normality, we may find open neighbourhoods ${U, V}$ of ${F, G}$ that are disjoint. For all ${x}$ on the boundary ${\partial U}$ , one has ${\| f(x)-y\| > r}$ for all ${x \in \partial U}$ . As ${\partial U}$ is compact and ${f}$ is continuous, we thus have ${\| f(x)-y\| > r'}$ for all ${x \in \partial U}$ if ${r'}$ is sufficiently close to ${r}$ . This makes ${U \cap K_{r'}}$ clopen in ${K_{r'}}$ , and so ${x}$ cannot lie in ${K_{r'}^1}$ , giving the desired contradiction. $\Box$

Observe that ${K_r^1}$ contains ${x_2}$ for ${r \geq \frac{2}{10} }$ , but does not contain ${x_2}$ for ${r=0}$ . By the monotonicity of the ${K_r^1}$ and least upper bound principle, there must therefore exist a critical ${0 \leq r_* \leq \frac{2}{10} }$ such that ${K_r^1}$ contains ${x_2}$ for all ${r > r_*}$ , but does not contain ${x_2}$ for ${r < r_*}$ . From Lemma 7 we see that ${K_{r_*}^1}$ must also contain ${x_2}$ . In particular, by Lemma 5, ${r_* > 0}$ .

We now analyse the critical set ${K_{r_*}^1}$ . By construction, this set is connected, compact, contains both ${x_1}$ and ${x_2}$ , contained in ${B(0,1)}$ , and one has ${\|f(x)-y\| \leq r_*}$ for all ${x \in K_{r_*}^1}$ .

Lemma 8 The set ${U := \{ x \in K_{r_*}^1: \|f(x)-y\| < r_* \}}$ is open and disconnected.

Proof: The openness is clear from the continuity of ${f}$ (and the local connectedness of ${{\bf R}^n}$ ). Now we show disconnectedness. Being an open subset of ${{\bf R}^n}$ , connectedness is equivalent to path connectedness, and ${x_1}$ and ${x_2}$ both lie in ${U}$ , so it suffices to show that ${x_1}$ and ${x_2}$ cannot be joined by a path ${\gamma}$ in ${U}$ . But if such a path ${\gamma}$ existed, then by compactness of ${\gamma}$ and continuity of ${f}$ , one would have ${\gamma \subset K_r}$ for some ${r < r_*}$ . This would imply that ${x_2 \in K_r^1}$ , contradicting the minimal nature of ${r_*}$ , and the claim follows.

Lemma 9 ${U}$ has at most finitely many connected components.

Proof: Let ${U_1}$ be a connected component of ${U}$ ; then ${f(U_1)}$ is non-empty and contained in ${B(y,r_*)}$ . As ${U}$ is open, ${U_1}$ is also open, and thus by Corollary 4, ${f(U_1)}$ is open also.

We claim that ${f(U_1)}$ is in fact all of ${B(y,r_*)}$ . Suppose this were not the case. As ${B(y,r_*)}$ is connected, this would imply that ${f(U_1)}$ is not closed in ${B(y,r_*)}$ ; thus there is an element ${z}$ of ${B(y,r_*)}$ which is adherent to ${f(U_1)}$ , but does not lie in ${f(U_1)}$ . Thus one may find a sequence ${x_n}$ in ${U_1}$ with ${f(x_n)}$ converging to ${z}$ . By compactness of ${K_{r_*}^1}$ (which contains ${U_1}$ ), we may pass to a subsequence and assume that ${x_n}$ converges to a limit ${x}$ in ${K_{r_*}^1}$ ; then ${f(x)=z}$ . By continuity, there is thus a ball ${B}$ centred at ${x}$ that is mapped to ${B(y,r)}$ for some ${r < r_*}$ ; this implies that ${B}$ lies in ${K_{r_*}}$ and hence in ${K_{r_*}^1}$ (since ${x \in K_{r_*}^1}$ ) and thence in ${U}$ (since ${r}$ is strictly less than ${r_*}$ ). As ${x}$ is adherent to ${U_1}$ and ${B}$ is connected, we conclude that ${B}$ lies in ${U_1}$ . In particular ${x}$ lies in ${U_1}$ and so ${z=f(x)}$ lies in ${f(U_1)}$ , a contradiction.

As ${f(U_1)}$ is equal to ${B(y,r_*)}$ , we thus see that ${U_1}$ contains an element of ${f^{-1}(\{y\})}$ . However, each element ${x}$ of ${f^{-1}(\{y\})}$ must be isolated since ${Df(x)}$ is non-singular. By compactness of ${K_{r_*}^1}$ , the set ${K_{r_*}^1}$ (and hence ${U}$ ) thus contains at most finitely many elements of ${f^{-1}(\{y\})}$ , and so there are finitely many components as claimed. $\Box$

Lemma 10 Every point in ${K_{r_*}^1}$ is adherent to ${U}$ (i.e. ${\overline{U} = K_{r_*}^1}$ ).

Proof: If ${x \in K_{r_*}^1}$ , then ${\|f(x)-y\| \leq r_*}$ . If ${\|f(x)-y\|<r_*}$ then ${x \in U}$ and we are done, so we may assume ${\|f(x)-y\| = r_*}$ . By differentiability, one has

$\displaystyle f(x') = f(x) + Df(x) (x'-x) + o(\|x'-x\|)$

for all ${x'}$ sufficiently close to ${x}$ . If we choose ${x'}$ to lie on a ray emenating from ${x}$ such that ${Df(x)(x'-x)}$ lies on a ray pointing towards ${y}$ from ${f(x)}$ (this is possible as ${Df(x)}$ is non-singular), we conclude that for all ${x'}$ sufficiently close to ${x}$ on this ray, ${\|f(x')-y\| < r_*}$ . Thus all such points ${x'}$ lie in ${K_{r_*}}$ ; since ${x}$ lies in ${K_{r_*}^1}$ and the ray is locally connected, we see that all such points ${x'}$ in fact lie in ${K_{r_*}^1}$ and thence in ${U}$ . The claim follows. $\Box$

Corollary 11 There exists a point ${x_* \in K_{r_*}^1}$ with ${\|f(x_*)-y\| = r_*}$ (i.e. ${x_*}$ lies outside ${U}$ ) which is adherent to at least two connected components of ${U}$ .

Proof: Suppose this were not the case, then the closures of all the connected components of ${U}$ would be disjoint. (Note that an element of one connected component of ${U}$ cannot lie in the closure of another component.) By Lemma 10, these closures would form a partition of ${K_{r_*}^1}$ by closed sets. By Lemma 8, there are at least two such closed sets, each of which is non-empty; by Lemma 9, the number of such closed sets is finite. But this contradicts the connectedness of ${K_{r_*}^1}$ . $\Box$

Next, we prove

Proposition 12 Let ${x_* \in K_{r_*}^1}$ be such that ${\|f(x_*)-y\|=r_*}$ , and suppose that ${x}$ is adherent to a connected component ${U_1}$ of ${U}$ . Let ${\omega}$ be the vector such that

$\displaystyle Df(x_*) \omega = y - f(x_*) \ \ \ \ \ (5)$

(this vector exists and is non-zero since ${Df(x_*)}$ is non-singular). Then ${U_1}$ contains an open ray of the form ${\{ x_* + t \omega: 0 < t < \epsilon \}}$ for some ${\epsilon > 0}$ .

This together with Corollary 11 gives the desired contradiction, since one cannot have two distinct components ${U_1, U_2}$ both contain a ray from ${x_*}$ in the direction ${\omega}$ .

Proof: As ${f}$ is differentiable at ${x_*}$ , we have

$\displaystyle f(x_* + t\omega) = f(x_*) + Df(x_*) t \omega + o(|t|)$

for all sufficiently small ${t}$ ; we rearrange this using (5) as

$\displaystyle f(x_* + t \omega) - y = (1-t) (f(x_*) - y) + o(|t|).$

In particular, ${f(x_*+t\omega) \in B(y,r_*)}$ for all sufficiently small positive ${t}$ . This shows that all sufficiently small open rays ${\{ x_* + t \omega: 0 < t < \epsilon \}}$ lie in ${K_{r_*}}$ , hence in ${K_{r_*}^1}$ (since ${x_* \in K_{r_*}^1}$ ), and hence in ${U}$ . In fact, the same argument shows that there is a cone

$\displaystyle \{ x_* + t \omega': 0 < t < \epsilon; \|\omega' - \omega\| \leq \epsilon \} \ \ \ \ \ (6)$

that will lie in ${U}$ if ${\epsilon}$ is small enough. As this cone is connected, it thus suffices to show that ${U_1}$ intersects this cone.

Let ${\delta > 0}$ be a small radius to be chosen later. As ${Df(x_*)}$ is non-singular, we see if ${\delta}$ is small enough that ${f(x) \neq f(x_*)}$ whenever ${\|x-x_*\| = \delta}$ . By continuity, we may thus find ${\kappa > 0}$ such that ${\|f(x)-f(x_*)\| > \kappa}$ whenever ${\|x-x_*\| = \delta}$ .

Consider the set

$\displaystyle U' := \{ x \in U_1: \|x-x_*\| \leq \delta; \|f(x)-f(x_*)\| < \kappa \}.$

As ${x_*}$ is adherent to ${U_1}$ , ${U'}$ is non-empty. By construction of ${\kappa}$ , we see that we also have

$\displaystyle U' := \{ x \in U_1: \|x-x_*\| < \delta; \|f(x)-f(x_*)\| < \kappa \}$

and so ${U'}$ is open. By Corollary 4, ${f(U')}$ is then also non-empty and open. By construction, ${f(U')}$ also lies in the set

$\displaystyle D := \{ z \in B(y,r_*): \|z-f(x_*)\| < \kappa \}.$

We claim that ${f(U')}$ is in fact all of ${D}$ . The proof will be a variant of the proof of Lemma 9. Suppose this were not the case. As ${D}$ is connected, this implies that there is an element ${z}$ of ${D}$ which is adherent to ${f(U')}$ , but does not lie in ${f(U')}$ . Thus one may find a sequence ${x_n}$ in ${U'}$ with ${f(x_n)}$ converging to ${z}$ . By compactness of ${K_{r_*}^1}$ (which contains ${U'}$ ), we may pass to a subsequence and assume that ${x_n}$ converges to a limit ${x}$ in ${K_{r_*}^1}$ ; then ${f(x)=z}$ . By continuity, there is thus a ball ${B}$ centred at ${x}$ contained in ${B(x_*,\delta)}$ that is mapped to ${B(y,r) \cap D}$ for some ${r < r_*}$ ; this implies that ${B}$ lies in ${K_{r_*}}$ and hence in ${K_{r_*}^1}$ (since ${x \in K_{r_*}^1}$ ) and thence in ${U}$ (since ${r}$ is strictly less than ${r_*}$ ). As ${x}$ is adherent to ${U_1}$ and ${B}$ is connected, we conclude that ${B}$ lies in ${U_1}$ and thence in ${U'}$ . In particular ${x}$ lies in ${U'}$ and so ${z=f(x)}$ lies in ${f(U')}$ , a contradiction.

As ${f(U') = D}$ , we may thus find a sequence ${t_n > 0}$ converging to zero, and a sequence ${x_n \in U'}$ , such that

$\displaystyle f(x_n) = f(x_*) + t_n (y - f(x_*)).$

However, if ${\delta}$ is small enough, we have ${\|f(x_n)-f(x_*)\|}$ comparable to ${\|x_n-x_*\|}$ (cf. (2)), and so ${x_n}$ converges to ${x_*}$ . By Taylor expansion, we then have

$\displaystyle f(x_n) = f(x_*) + Df(x_*) (x_n-x_*) + o(\|x_n-x_*\|)$

and thus

$\displaystyle (Df(x_*)+o(1)) (x_n-x_*) = t_n Df(x_*) \omega$

for some matrix-valued error ${o(1)}$ . Since ${Df(x_*)}$ is invertible, this implies that

$\displaystyle x_n-x_* = t_n (1+o(1)) \omega = t_n \omega + o(t_n).$

In particular, ${x_n}$ lies in the cone (6) for ${n}$ large enough, and the claim follows. $\Box$

39 comments

Comments feed for this article

12 September, 2011 at 4:31 pm

Leandro Cioletti

Thanks for sharing this. I read the mathoverflow question and I liked very much this result, but we don’t have in our local library a copy of the papers cited there.

12 September, 2011 at 4:55 pm

Philippe

Small correction: the person that pointed this proof to you is Julien Melleray (not Malleray).

[Corrected, thanks – T.]

12 September, 2011 at 5:54 pm

vineel567

what tools do you use to write such a wonderful technical articles with all the special characters and mathematical style preserved.??? Thanks in advance

15 September, 2011 at 3:17 am

Willie Wong

Take a look at this page: https://terrytao.wordpress.com/about/ especially after the “Some technical remarks” fold.

14 September, 2011 at 7:45 pm

Anonymous

Sorry I mean everywhere invertibility. Is it necessary? If so why is the local invertibility not sufficient?

15 September, 2011 at 3:20 am

Willie Wong

The statement does only require local invertibility, no? It only requires invertibility at every point in the open neighborhood $\Omega$ . Or are you asking whether invertibility at a single point is sufficient?

15 September, 2011 at 5:21 am

Anonymous

This result seems counter-intuitive. In the 1D case, if f'(x)<0 for x0 for x>x_0 then it doesn’t seem possible for the function f to be invertible at x_0. Also, the derivative is invertible in a neighborhood of x_0 so it seems to satisfy the conditions of the theorem.

[You will need to use < and > instead of < and > to avoid your comments being mangled by the HTML parser. In any event, in one dimension the result is precisely the contrapositive of Rolle’s theorem. – T.]

15 September, 2011 at 5:30 am

Anonymous

It should be f'(x)0 for x > x0.

17 September, 2011 at 7:30 pm

Josh Swanson

Thank you for typing this up! Reading through it was a nice review of basic real analysis and topology.

The Brouwer fixed point theorem is false with open balls, so the map you mention in the proof of Proposition 3 should be between closed balls. Even so, the fixed point isn’t on the boundary (from (3) scaled by the magnitude of x) so the result still follows easily.

In the same proposition, I wasn’t able to follow the reasoning in your third argument which gives |x| < r (i.e. |x| is not r) without taking |y| < r/3 rather than |y| < r/2. This doesn’t seem to affect anything adversely, though. I really like that argument since it makes the overall proof much more elementary.

I wasn’t able to follow the reasoning in the last few lines–in particular replacing o(||x_n – x_*||) with o(t_n). I was able to show the result anyway, though I had to pass to a further subsequence and use the invertibility of the derivative to do so.

Minor typos/issues:
“point set” -> “point-set”
“A third argument (…) as follows” -> “A third argument (…) is as follows”
“image of any open neighborhood” -> “the image of any open neighborhood”
In the proof of Lemma 6, strictly speaking the B(y, r) should be closed in each case
In Lemma 7, dividing delta by 2 doesn’t seem to serve a purpose
“then f(U_1) is non-empty contained in” -> “then f(U_1) is non-empty and contained in”
“conpactness” -> “compactness”
“This together with Corollary 11 with” -> “This together with Corollary 11”
In defining U’ (and in the restatement right after), there’s a missing vertical bar

[Corrected, thanks – T.]

19 September, 2011 at 12:47 pm

Anonymous

What about the following proof? Is it related to Cernavskii’s or Vaisala’s?

As above, let’s prove that $f$ is a local homeomorphism at a point $x_0\in\Omega$ . We may assume $f(x_0)=x_0=0$ and $|f(x)-x|<\frac{1}{2}|x|$ for $x\in B(0,1)$ .

Due to Proposition 3 above, for each $y\in \overline{B(0,\frac{1}{3})}$ , there is a point in $x\in \overline{B(0,\frac{2}{3})}$ such that $f(x)=y$ . Moreover, $f^{-1}(y)$ is compact. Take, in $f^{-1}(y)$ , the point with maximum first coordinate. If there is more than one such point, take the that with maximum second coordinate, and so forth. Let the chosen point equal $g(y)$ .

The function $g$ is continuous, since

i) Because $f$ is open, for every point near $y$ there is a pre-image near $f(y)$ .
ii) If there is a sequence $y_n\to y$ such that, say, $g(y_n)$ has first coordinate bigger than the first coordinate of $g(y)$ with a non-vanishing difference, we can take a subsequence $y_{n_k}$ which converges to a point $y'$ whose first coordinate is bigger than that of $y$ , a contradiction.

$g$ is obviously injective, and it is defined in a compact set. Thus, $g$ is a homeomorphism.

This function is also a left inverse of $f$ (that is, $g\circ f=id$ in the image of $g$ ). The claim then follows from the fact that injective continuous functions from $n$ -dimensional sets to $n$ -dimensional sets are open.

19 September, 2011 at 1:38 pm

Terence Tao

I don’t think the proof of the continuity of g is complete, even in the one-dimensional case. For instance, what prevents $g(y_n)$ from being significantly below $g(y)$ in one dimension?

Note that there are plenty of continuous proper maps without continuous left-inverses, e.g. $f(x) := x^3 - x$ in one dimension. (Note in this case that the function g as defined above is discontinuous.)

19 September, 2011 at 4:09 pm

Anonymous

Errata: above, the line “which converges to a point $y'$ whose first coordinate is bigger than that of $y$ , a contradiction” is wrong. Instead of it, define $x=g(y)$ and read “such that $g(y_{n_k})$ converges to a point $x'$ whose first coordinate is bigger than that of $x$ , a contradiction”.

Dear Tao,

indeed that “proof” is wrong. I still think the case $n=1$ is ok:

Take any neighborhood $N$ of $x$ . If $m$ is sufficiently large, $y_m\in f(N)$ , so we have a candidate $z_m$ for $g(y_m)$ in $N$ . We may suppose $z_m\to x$ (e.g., by choosing $z_m$ as close to $x$ as possible).

In one dimension, by definition, $x_m$ is the biggest element of $f^{-1}(y_m)$ , so that its lim inf is at least $\lim_{m\to\infty}z_m=x$ . Then, as above, if $y_m$ does not go to $x$ , we achieve a contradiction.

However, in more dimensions, I have overlooked something: the coordinates of the $y_m$ ‘s don’t behave in the same way.

For the sake of clarity, given any $a,b\in\mathbb R^n$ , let’s say $a> b$ if and only if $a=b$ or one of the following conditions holds.

$a_1> b_1$ .
$a_1=b_1$ and $a_2> b_2$ .
$a_1=b_1$ , $a_2=b_2$ and $a_3> b_3$ .
Etc.

The failure of the proof is related to the instability of the relation above: if $p$ and $q$ are sequences such that $p_m > q_m$ , $\lim_{m\to\infty} p_m=p$ and $\lim_{m\to\infty} q_m=q$ , it is not necessarily true that $p\geq q$ .

I will try to fix the argument, even though I think success is not probable.

Question: if $g$ were continous, doesn’t the conclusion follow? For it is known that injective continuous maps (in this case, $g$ ) are local homeomorphisms. So $g$ would a homeomorphism between two neighborhoods of $0$ , and $f$ should be its inverse.

20 September, 2011 at 11:41 am

Twelth Linkfest

[…] Tao: The inverse function theorem for everywhere differentiable maps, The Brunn-Minkowski inequality for nilpotent […]

21 September, 2011 at 9:50 am

Anonymous

Dear Prof. Tao,

in the proof of lemma 7 you ask the reader to “note that $K_{r} \setminus K_{r}^{1}$ is closed”. Can you give me a hint on how to see this? Components are closed in general (which gives the compactness of $K_{r}^{1}$ ), but I do not see, why $K_{r}^{1}$ is also open (relative to $K_{r}$ ).

Thanks in advance.

22 September, 2011 at 2:12 am

Terence Tao

Oops, the argument is not quite correct as stated; I’ve rewritten it. The key point is that in a compact set, any two points that don’t lie in the same connected component can be separated from each other by clopen sets (but the connected components themselves need not be clopen).

8 October, 2011 at 10:43 am

Implicit function theorem | Mathitself

[…] deep result is that it’s enough that is everywhere differentiable (see Tao’s post), which remains one an other deep result called invariance of […]

9 March, 2012 at 11:26 pm

francescodifusco

Reblogged this on FRANCESCO DI FUSCO.

19 March, 2012 at 10:01 am

Alan Macdonald

The paper “A Strong Inverse Function Theorem” by William J. Knight
(The American Mathematical Monthly, Vol. 95, No. 7, pp. 648-651) gives an improvement of the standard result in a different direction.

9 September, 2012 at 8:26 am

Olaf Zurth

A Question: Is there in the first line of Lemma 6 a typo?
${0,\frac{2}{10} \leq r \leq \frac{4}{10}}$ should read as ${\frac{2}{10} \leq r \leq \frac{4}{10}}$ or did I misunderstood something?

[Corrected, thanks – T.]

8 November, 2012 at 5:35 am

blindman

Dear Professor Tao,

the local inverses $f^{-1}:V\rightarrow U$ are also differentiable at $x_0$

should be replaced by

the local inverses $f^{-1}:V\rightarrow U$ are also differentiable at $f(x_0)$

[Corrected, thanks – T.]

8 November, 2012 at 6:10 am

blindman

Dear Professor Terence Tao,

In the book “Mathematical Analysis on Manifold” of Michael Spivak, the author gave a counterexample of Theorem 1 (Exercise 2.39, page 52) if the assumption on the continuity of the derivate $f^{\prime}$ is violated.
They consider the function $f::\mathbb{R}\rightarrow\mathbb{R}$ given by

+ $f(x)=\frac{x}{2}+x^2\sin\frac{1}{x}$ if $x\ne 0$ ;

+ $f(x)=0$ if $x=0$

I would to ask your comments about this situation.

Thank your for your helping.

8 November, 2012 at 6:19 am

Terence Tao

This function has non-zero derivative at 0, but has vanishing derivative at many other points (as can be seen for instance from a plot of the function
), and so does not contradict Theorem 1.

8 November, 2012 at 4:02 pm

blindman

Dear Sir. Thank you for your comments and helping.

13 December, 2012 at 3:22 pm

pera

And what would be counterexample?

28 March, 2014 at 12:11 pm

Alexander

Why not just to use the Mean value theorem on a segment $[x_1,x_2]$ to obtain injectivity immidiately instead of all these lemmas?

29 March, 2014 at 3:08 am

Alexander

Ok, I see: because there’s no such a theorem… sorry!

14 December, 2015 at 11:02 am

The inverse function theorem | Negro's notes

[…] differentiable conditions on a mapping which ensure that it is a local diffeomorphism. (But see Terry Tao’s blog for a differentiable, non-smooth inverse function theorem). The main point of such theorems is the […]

24 November, 2016 at 7:11 am

Mladen Đalto

Is there a pseudoinverse version of this theorem for $ f: \mathbb{R}^n \rightarrow \mathbb{R}^m$ where $n$ is not equal $m$ ?
If not, what method of proof/research would you recommend ?
I require it for machine learning research so any pointers would be appreciated.
Thank you.

15 April, 2017 at 2:28 pm

Anonymous

In Spivak’s Calculus on manifolds, there is an exercise that the function $f:\mathbb{R}\to\mathbb{R}$ defined by

$\displaystyle f(x)=\frac{x}{2}+x^2\sin\frac{1}{x},\quad x\neq 0$

and $f(0)=0$ shows that continuity of the derivative cannot be eliminated from the hypothesis of the inverse function theorem. Why does this not contradict Theorem 2?

16 April, 2017 at 6:45 pm

Terence Tao

This map $f$ has an infinite number of critical points near the origin, so the hypotheses of Theorem 2 are not satisfied. (Perhaps the formulation of the inverse function theorem in Spivak is slightly different, in that one only demands invertibility of the derivative at a single point, rather than on an entire domain.)

17 April, 2017 at 10:25 am

Anonymous

I’m very much curious about motivation of considering the map ${x \mapsto x-f(x)+y}$ , which is the key step to recover the proof of Theorem 1. Other than just memorizing it, I have never seen in any textbook an explanation for a possible motivation (algebra? geometry? or something else) for this map.

17 April, 2017 at 12:10 pm

Terence Tao

If $f(0)=0$ and $Df(0)$ is the identity, then Taylor expansion suggests the approximation $f(x) \approx x$ for small $x$ . If this approximation were exact – $f(x) = x$ – then one could easily solve the equation $f(x)=y$ by writing it as $x=y$ . But the approximation is inexact; the best one can do with regards to transforming $f(x)=y$ to an equation that looks like $x=y$ is to add $x-f(x)$ to both sides to obtain $x = x - f(x) + y$ . The error here $x-f(x)$ is not zero, but it is “small”, and in particular (for small $x$ and continuously differentiable $f$ ) the map $x \mapsto x - f(x)+y$ still behaves somewhat like a constant, in that it is a contraction.

26 March, 2018 at 9:41 pm

Joe Higgins

Can you give a hint about the statement ‘one can recover the differentiability of the local inverses’ please? Much appreciated.

27 March, 2018 at 5:42 pm

Terence Tao

Let’s say $y_0 = f(x_0)$ . If $y_n$ approaches $y_0$ , then by the homeomorphism property we have local inverses $x_n = f^{-1}(y_n)$ that approach $x_0$ . On the other hand, we have the Newton approximation $f(x_n) = f(x_0) + Df(x_0)(x_n-x_0) + o(|x_n-x_0|)$ . One can put these facts together to obtain the usual formula for the derivative of $f^{-1}$ at $y_0$ .

19 December, 2023 at 9:17 am

Anonymous

I’m a little confused about this, following the hint I obtain $f^{-1}(y) = f^{-1}(y_0) + (Df(x_0))^{-1}(y-y_0) + o(|f^{-1}(y)-f^{-1}(y_0)|$, so I have to prove that $o(|f^{-1}(y)-f^{-1}(y_0)|$ is also $o(|y-y_0|)$, but I can’t find a way without supposing continuity of $Df$ at the point $x_0$. Thank you in advance.

11 July, 2020 at 7:24 pm

Jaikrishnan Janardhanan

There seems to be a new and elementary proof of the inverse function theorem for just everywhere differentiable maps:
https://www.jstor.org/stable/10.14321/realanalexch.43.2.0429

11 January, 2023 at 6:42 pm

Eduardo Ramos

Hello Terence Tao. Are you still interested in such question? I think I solved the infinite dimension case, where I proved the same is true in Bannach Spaces as long as F is “locally proper” in some sense, which includes for instance functions of the form I+K, K being compact. Additionally I think the proof is actually simpler, and generates interesting results in more general spaces. Could you read my proof?

11 January, 2023 at 7:48 pm

Eduardo Ramos

Specifically, I proved that for Banach spaces $X$ and $Y$ , if $F: U\subset X\to Y$ is Fréchet differentiable and $DF(x):X\to Y$ invertible for all $x\in U$ , then $F$ will be a local homeomorphism if and only if $F$ is ‘locally proper’.

Here $F$ being locally proper means that for each $x_0\in U$ there exists a closed neighborhood $V\subset U$ of $x_0$ such that if $(x_n)$ is a sequence in $V$ with $F(x_n)\to c\in Y$ , then $(x_n)$ has a convergent subsequence.

21 January, 2023 at 3:40 pm

Eduardo Ramos

Well, after checking each detail of my proof I found an error. Actually with the above conditions I can only prove that $F$ is an open map. Under the above hypothesis I also found a similar argument for the infinite dimensional case that can follow all steps of your arguments up to Proposition 12, where it sadly fails. Soon I will publish the open mapping result, and will keep working on the local injectivity problem.

	Anonymous on Erratum for “An inverse…
	Anonymous on Erratum for “An inverse…
	Anonymous on Pointwise ergodic theorems for…
	Anonymous on 275A, Notes 3: The weak and st…
	Terence Tao on Pointwise ergodic theorems for…
	Terence Tao on Erratum for “An inverse…
	Anonymous on Notes on the B+B+t theore…
	Anonymous on Pointwise ergodic theorems for…
	Anonymous on Erratum for “An inverse…
	Erratum for “A… on An inverse theorem for the Gow…
	Anonymous on Analysis II
	Anonymous on Notes on the B+B+t theore…
	Anonymous on Twisted convolution and the se…
	Anonymous on A generalized Cauchy-Schwarz i…
	Notes on the B+B+t t… on Ultrafilters, nonstandard anal…

The inverse function theorem for everywhere differentiable maps

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

39 comments

Leave a comment Cancel reply

For commenters

The inverse function theorem for everywhere differentiable maps

Share this:

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

39 comments

Leave a comment Cancel reply

For commenters