An extremely large portion of mathematics is concerned with locating solutions to equations such as
for in some suitable domain space (either finite-dimensional or infinite-dimensional), and various maps or . To solve the fixed point iteration equation (1), the simplest general method available is the fixed point iteration method: one starts with an initial approximate solution to (1), so that , and then recursively constructs the sequence by . If behaves enough like a “contraction”, and the domain is complete, then one can expect the to converge to a limit , which should then be a solution to (1). For instance, if is a map from a metric space to itself, which is a contraction in the sense that
for all and some , then with as above we have
for any , and so the distances between successive elements of the sequence decay at at least a geometric rate. This leads to the contraction mapping theorem, which has many important consequences, such as the inverse function theorem and the Picard existence theorem.
A slightly more complicated instance of this strategy arises when trying to linearise a complex map defined in a neighbourhood of a fixed point. For simplicity we normalise the fixed point to be the origin, thus and . When studying the complex dynamics , , of such a map, it can be useful to try to conjugate to another function , where is a holomorphic function defined and invertible near with , since the dynamics of will be conjguate to that of . Note that if and , then from the chain rule any conjugate of will also have and . Thus, the “simplest” function one can hope to conjugate to is the linear function . Let us say that is linearisable (around ) if it is conjugate to in some neighbourhood of . Equivalently, is linearisable if there is a solution to the Schröder equation
for some defined and invertible in a neighbourhood of with , and all sufficiently close to . (The Schröder equation is normalised somewhat differently in the literature, but this form is equivalent to the usual form, at least when is non-zero.) Note that if solves the above equation, then so does for any non-zero , so we may normalise in addition to , which also ensures local invertibility from the inverse function theorem. (Note from winding number considerations that cannot be invertible near zero if vanishes.)
We have the following basic result of Koenigs:
Theorem 1 (Koenig’s linearisation theorem) Let be a holomorphic function defined near with and . If (attracting case) or (repelling case), then is linearisable near zero.
Let be a sufficiently small radius, and let denote the space of holomorphic functions on the complex disk with and . We can view the Schröder equation (2) as a fixed point equation
where is the partially defined function on that maps a function to the function defined by
assuming that is well-defined on the range of (this is why is only partially defined).
We can solve this equation by the fixed point iteration method, if is small enough. Namely, we start with being the identity map, and set , etc. We equip with the uniform metric . Observe that if , and is small enough, then takes values in , and are well-defined and lie in . Also, since is smooth and has derivative at , we have
if , and is sufficiently small depending on . This is not yet enough to establish the required contraction (thanks to Mario Bonk for pointing this out); but observe that the function is holomorphic on and bounded by on the boundary of this ball (or slightly within this boundary), so by the maximum principle we see that
on all of , and in particular
on . Putting all this together, we see that
since , we thus obtain a contraction on the ball if is small enough (and sufficiently small depending on ). From this (and the completeness of , which follows from Morera’s theorem) we see that the iteration converges (exponentially fast) to a limit which is a fixed point of , and thus solves Schröder’s equation, as required.
Koenig’s linearisation theorem leaves open the indifferent case when . In the rationally indifferent case when for some natural number , there is an obvious obstruction to linearisability, namely that (in particular, linearisation is not possible in this case when is a non-trivial rational function). An obstruction is also present in some irrationally indifferent cases (where but for any natural number ), if is sufficiently close to various roots of unity; the first result of this form is due to Cremer, and the optimal result of this type for quadratic maps was established by Yoccoz. In the other direction, we have the following result of Siegel:
Theorem 2 (Siegel’s linearisation theorem) Let be a holomorphic function defined near with and . If and one has the Diophantine condition for all natural numbers and some constant , then is linearisable at .
The Diophantine condition can be relaxed to a more general condition involving the rational exponents of the phase of ; this was worked out by Brjuno, with the condition matching the one later obtained by Yoccoz. Amusingly, while the set of Diophantine numbers (and hence the set of linearisable ) has full measure on the unit circle, the set of non-linearisable is generic (the complement of countably many nowhere dense sets) due to the above-mentioned work of Cremer, leading to a striking disparity between the measure-theoretic and category notions of “largeness”.
Siegel’s theorem does not seem to be provable using a fixed point iteration method. However, it can be established by modifying another basic method to solve equations, namely Newton’s method. Let us first review how this method works to solve the equation for some smooth function defined on an interval . We suppose we have some initial approximant to this equation, with small but not necessarily zero. To make the analysis more quantitative, let us suppose that the interval lies in for some , and we have the estimates
for some and and all (the factors of are present to make “dimensionless”).
In particular, setting , , and , we have , and
for all .
The crucial point here is that the new error is roughly the square of the previous error . This leads to extremely fast (double-exponential) improvement in the error upon iteration, which is more than enough to absorb the exponential losses coming from the factor.
Proof: If for some absolute constants then we may simply take , so we may assume that for some small and large . Using the Newton approximation we are led to the choice
for . From the hypotheses on and the smallness hypothesis on we certainly have . From Taylor’s theorem with remainder we have
and the claim follows.
We can iterate this procedure; starting with as above, we obtain a sequence of nested intervals with , and with evolving by the recursive equations and estimates
If is sufficiently small depending on , we see that converges rapidly to zero (indeed, we can inductively obtain a bound of the form for some large absolute constant if is small enough), and converges to a limit which then solves the equation by the continuity of .
As I recently learned from Zhiqiang Li, a similar scheme works to prove Siegel’s theorem, as can be found for instance in this text of Carleson and Gamelin. The key is the following analogue of Lemma 3.
for all , and such that
for all and some .
Proof: By scaling we may normalise . If for some constants , then we can simply take to be the identity and , so we may assume that for some small and large .
To motivate the choice of , we write and , with and viewed as small. We would like to have , which expands as
This equation can be solved by Taylor series; the function vanishes to second order at the origin and thus has a Taylor expansion
and then has a Taylor expansion
We take this as our definition of , define , and then define implicitly via (4).
Let us now justify that this choice works. By (3) and the generalised Cauchy integral formula, we have for all ; by the Diophantine assumption on , we thus have . In particular, converges on , and on the disk (say) we have the bounds
In particular, as is so small, we see that maps injectively to and to , and the inverse maps to . From (3) we see that maps to , and so if we set to be the function , then is a holomorphic function obeying (4). Expanding (4) in terms of and as before, and also writing , we have
for , which by (5) simplifies to
From (6), the fundamental theorem of calculus, and the smallness of we have
on , and the claim follows.
If we set , , and to be sufficiently small, then (since vanishes to second order at the origin), the hypotheses of this lemma will be obeyed for some sufficiently small . Iterating the lemma (and halving repeatedly), we can then find sequences , injective holomorphic functions and holomorphic functions such that one has the recursive identities and estimates
for all and . By construction, decreases to a positive radius that is a constant multiple of , while (for small enough) converges double-exponentially to zero, so in particular converges uniformly to on . Also, is close enough to the identity, the compositions are uniformly convergent on with and . From this we have
on , and on taking limits using Morera’s theorem we obtain a holomorphic function defined near with , , and
obtaining the required linearisation.
Remark 5 The idea of using a Newton-type method to obtain error terms that decay double-exponentially, and can therefore absorb exponential losses in the iteration, also occurs in KAM theory and in Nash-Moser iteration, presumably due to Siegel’s influence on Moser. (I discuss Nash-Moser iteration in this note that I wrote back in 2006.)