You are currently browsing Terence Tao’s articles.

I’ve just uploaded to the arXiv my paper “Some remarks on the lonely runner conjecture“, submitted to Contributions to discrete mathematics. I had blogged about the lonely runner conjecture in this previous blog post, and I returned to the problem recently to see if I could obtain anything further. The results obtained were more modest than I had hoped, but they did at least seem to indicate a potential strategy to make further progress on the problem, and also highlight some of the difficulties of the problem.

One can rephrase the lonely runner conjecture as the following covering problem. Given any integer “velocity” {v} and radius {0 < \delta < 1/2}, define the Bohr set {B(v,\delta)} to be the subset of the unit circle {{\bf R}/{\bf Z}} given by the formula

\displaystyle B(v,\delta) := \{ t \in {\bf R}/{\bf Z}: \|vt\| \leq \delta \},

where {\|x\|} denotes the distance of {x} to the nearest integer. Thus, for {v} positive, {B(v,\delta)} is simply the union of the {v} intervals {[\frac{a-\delta}{v}, \frac{a+\delta}{v}]} for {a=0,\dots,v-1}, projected onto the unit circle {{\bf R}/{\bf Z}}; in the language of the usual formulation of the lonely runner conjecture, {B(v,\delta)} represents those times in which a runner moving at speed {v} returns to within {\delta} of his or her starting position. For any non-zero integers {v_1,\dots,v_n}, let {\delta(v_1,\dots,v_n)} be the smallest radius {\delta} such that the {n} Bohr sets {B(v_1,\delta),\dots,B(v_n,\delta)} cover the unit circle:

\displaystyle {\bf R}/{\bf Z} = \bigcup_{i=1}^n B(v_i,\delta). \ \ \ \ \ (1)


Then define {\delta_n} to be the smallest value of {\delta(v_1,\dots,v_n)}, as {v_1,\dots,v_n} ranges over tuples of distinct non-zero integers. The Dirichlet approximation theorem quickly gives that

\displaystyle \delta(1,\dots,n) = \frac{1}{n+1}

and hence

\displaystyle \delta_n \leq \frac{1}{n+1}

for any {n \geq 1}. The lonely runner conjecture is equivalent to the assertion that this bound is in fact optimal:

Conjecture 1 (Lonely runner conjecture) For any {n \geq 1}, one has {\delta_n = \frac{1}{n+1}}.

This conjecture is currently known for {n \leq 6} (see this paper of Barajas and Serra), but remains open for higher {n}.

It is natural to try to attack the problem by establishing lower bounds on the quantity {\delta_n}. We have the following “trivial” bound, that gets within a factor of two of the conjecture:

Proposition 2 (Trivial bound) For any {n \geq 1}, one has {\delta_n \geq \frac{1}{2n}}.

Proof: It is not difficult to see that for any non-zero velocity {v} and any {0 < \delta < 1/2}, the Bohr set {B(v,\delta)} has Lebesgue measure {m(B(v,\delta)) = 2\delta}. In particular, by the union bound

\displaystyle m(\bigcup_{i=1}^n B(v_i,\delta)) \leq \sum_{i=1}^n m(B(v_i,\delta)) \ \ \ \ \ (2)


we see that the covering (1) is only possible if {1 \leq 2 n \delta}, giving the claim. \Box

So, in some sense, all the difficulty is coming from the need to improve upon the trivial union bound (2) by a factor of two.

Despite the crudeness of the union bound (2), it has proven surprisingly hard to make substantial improvements on the trivial bound {\delta_n \geq \frac{1}{2n}}. In 1994, Chen obtained the slight improvement

\displaystyle \delta_n \geq \frac{1}{2n - 1 + \frac{1}{2n-3}}

which was improved a little by Chen and Cusick in 1999 to

\displaystyle \delta_n \geq \frac{1}{2n-3}

when {2n-3} was prime. In a recent paper of Perarnau and Serra, the bound

\displaystyle \delta_n \geq \frac{1}{2n-2+o(1)}

was obtained for arbitrary {n}. These bounds only improve upon the trivial bound by a multiplicative factor of {1+O(1/n)}. Heuristically, one reason for this is as follows. The union bound (2) would of course be sharp if the Bohr sets {B(v_i,\delta)} were all disjoint. Strictly speaking, such disjointness is not possible, because all the Bohr sets {B(v_i,\delta)} have to contain the origin as an interior point. However, it is possible to come up with a large number of Bohr sets {B(v_i,\delta)} which are almost disjoint. For instance, suppose that we had velocities {v_1,\dots,v_s} that were all prime numbers between {n/4} and {n/2}, and that {\delta} was equal to {\delta_n} (and in particular was between {1/2n} and {1/(n+1)}. Then each set {B(v_i,\delta)} can be split into a “kernel” interval {[-\frac{\delta}{v_i}, \frac{\delta}{v_i}]}, together with the “petal” intervals {\bigcup_{a=1}^{v_i-1} [\frac{a-\delta}{v_i}, \frac{a+\delta}{v_i}]}. Roughly speaking, as the prime {v_i} varies, the kernel interval stays more or less fixed, but the petal intervals range over disjoint sets, and from this it is not difficult to show that

\displaystyle m(\bigcup_{i=1}^s B(v_i,\delta)) = (1-O(\frac{1}{n})) \sum_{i=1}^s m(B(v_i,\delta)),

so that the union bound is within a multiplicative factor of {1+O(\frac{1}{n})} of the truth in this case.

This does not imply that {\delta_n} is within a multiplicative factor of {1+O(1/n)} of {\frac{1}{2n}}, though, because there are not enough primes between {n/4} and {n/2} to assign to {n} distinct velocities; indeed, by the prime number theorem, there are only about {\frac{n}{4\log n}} such velocities that could be assigned to a prime. So, while the union bound could be close to tight for up to {\asymp n/\log n} Bohr sets, the above counterexamples don’t exclude improvements to the union bound for larger collections of Bohr sets. Following this train of thought, I was able to obtain a logarithmic improvement to previous lower bounds:

Theorem 3 For sufficiently large {n}, one has {\delta_n \geq \frac{1}{2n} + \frac{c \log n}{n^2 (\log\log n)^2}} for some absolute constant {c>0}.

The factors of {\log\log n} in the denominator are for technical reasons and might perhaps be removable by a more careful argument. However it seems difficult to adapt the methods to improve the {\log n} in the numerator, basically because of the obstruction provided by the near-counterexample discussed above.

Roughly speaking, the idea of the proof of this theorem is as follows. If we have the covering (1) for {\delta} very close to {1/2n}, then the multiplicity function {\sum_{i=1}^n 1_{B(v_i,\delta)}} will then be mostly equal to {1}, but occasionally be larger than {1}. On the other hand, one can compute that the {L^2} norm of this multiplicity function is significantly larger than {1} (in fact it is at least {(3/2-o(1))^{1/2}}). Because of this, the {L^3} norm must be very large, which means that the triple intersections {B(v_i,\delta) \cap B(v_j,\delta) \cap B(v_k,\delta)} must be quite large for many triples {(i,j,k)}. Using some basic Fourier analysis and additive combinatorics, one can deduce from this that the velocities {v_1,\dots,v_n} must have a large structured component, in the sense that there exists an arithmetic progression of length {\asymp n} that contains {\asymp n} of these velocities. For simplicity let us take the arithmetic progression to be {\{1,\dots,n\}}, thus {\asymp n} of the velocities {v_1,\dots,v_n} lie in {\{1,\dots,n\}}. In particular, from the prime number theorem, most of these velocities will not be prime, and will in fact likely have a “medium-sized” prime factor (in the precise form of the argument, “medium-sized” is defined to be “between {\log^{10} n} and {n^{1/10}}“). Using these medium-sized prime factors, one can show that many of the {B(v_i,\delta)} will have quite a large overlap with many of the other {B(v_j,\delta)}, and this can be used after some elementary arguments to obtain a more noticeable improvement on the union bound (2) than was obtained previously.

A modification of the above argument also allows for the improved estimate

\displaystyle \delta(v_1,\dots,v_n) \geq \frac{1+c-o(1)}{2n} \ \ \ \ \ (3)


if one knows that all of the velocities {v_1,\dots,v_n} are of size {O(n)}.

In my previous blog post, I showed that in order to prove the lonely runner conjecture, it suffices to do so under the additional assumption that all of the velocities {v_1,\dots,v_n} are of size {O(n^{O(n^2)})}; I reproduce this argument (slightly cleaned up for publication) in the current preprint. There is unfortunately a huge gap between {O(n)} and {O(n^{O(n^2)})}, so the above bound (3) does not immediately give any new bounds for {\delta_n}. However, one could perhaps try to start attacking the lonely runner conjecture by increasing the range {O(n)} for which one has good results, and by decreasing the range {O(n^{O(n^2)})} that one can reduce to. For instance, in the current preprint I give an elementary argument (using a certain amount of case-checking) that shows that the lonely runner bound

\displaystyle \delta(v_1,\dots,v_n) \geq \frac{1}{n+1} \ \ \ \ \ (4)


holds if all the velocities {v_1,\dots,v_n} are assumed to lie between {1} and {1.2 n}. This upper threshold of {1.2 n} is only a tiny improvement over the trivial threshold of {n}, but it seems to be an interesting sub-problem of the lonely runner conjecture to increase this threshold further. One key target would be to get up to {2n}, as there are actually a number of {n}-tuples {(v_1,\dots,v_n)} in this range for which (4) holds with equality. The Dirichlet approximation theorem of course gives the tuple {(1,2,\dots,n)}, but there is also the double {(2,4,\dots,2n)} of this tuple, and furthermore there is an additional construction of Goddyn and Wong that gives some further examples such as {(1,2,3,4,5,7,12)}, or more generally one can start with the standard tuple {(1,\dots,n)} and accelerate one of the velocities {v} to {2v}; this turns out to work as long as {v} shares a common factor with every integer between {n-v+1} and {2n-2v+1}. There are a few more examples of this type in the paper of Goddyn and Wong, but all of them can be placed in an arithmetic progression of length {O(n \log n)} at most, so if one were very optimistic, one could perhaps envision a strategy in which the upper bound of {O(n^{O(n^2)})} mentioned earlier was reduced all the way to something like {O( n \log n )}, and then a separate argument deployed to treat this remaining case, perhaps isolating the constructions of Goddyn and Wong (and possible variants thereof) as the only extreme cases.

I just learned (from Emmanuel Kowalski’s blog) that the AMS has just started a repository of open-access mathematics lecture notes.  There are only a few such sets of notes there at present, but hopefully it will grow in the future; I just submitted some old lecture notes of mine from an undergraduate linear algebra course I taught in 2002 (with some updating of format and fixing of various typos).


[Update, Dec 22: my own notes are now on the repository.]

I’ve just uploaded to the arXiv my paper Finite time blowup for a supercritical defocusing nonlinear Schrödinger system, submitted to Analysis and PDE. This paper is an analogue of a recent paper of mine in which I constructed a supercritical defocusing nonlinear wave (NLW) system {-\partial_{tt} u + \Delta u = (\nabla F)(u)} which exhibited smooth solutions that developed singularities in finite time. Here, we achieve essentially the same conclusion for the (inhomogeneous) supercritical defocusing nonlinear Schrödinger (NLS) equation

\displaystyle  i \partial_t u + \Delta u = (\nabla F)(u) + G \ \ \ \ \ (1)

where {u: {\bf R} \times {\bf R}^d \rightarrow {\bf C}^m} is now a system of scalar fields, {F: {\bf C}^m \rightarrow {\bf R}} is a potential which is strictly positive and homogeneous of degree {p+1} (and invariant under phase rotations {u \mapsto e^{i\theta} u}), and {G: {\bf R} \times {\bf R}^d \rightarrow {\bf C}^m} is a smooth compactly supported forcing term, needed for technical reasons.

To oversimplify somewhat, the equation (1) is known to be globally regular in the energy-subcritical case when {d \leq 2}, or when {d \geq 3} and {p < 1+\frac{4}{d-2}}; global regularity is also known (but is significantly more difficult to establish) in the energy-critical case when {d \geq 3} and {p = 1 +\frac{4}{d-2}}. (This is an oversimplification for a number of reasons, in particular in higher dimensions one only knows global well-posedness instead of global regularity. See this previous post for some exploration of this issue in the context of nonlinear wave equations.) The main result of this paper is to show that global regularity can break down in the remaining energy-supercritical case when {d \geq 3} and {p > 1 + \frac{4}{d-2}}, at least when the target dimension {m} is allowed to be sufficiently large depending on the spatial dimension {d} (I did not try to achieve the optimal value of {m} here, but the argument gives a value of {m} that grows quadratically in {d}). Unfortunately, this result does not directly impact the most interesting case of the defocusing scalar NLS equation

\displaystyle  i \partial_t u + \Delta u = |u|^{p-1} u \ \ \ \ \ (2)

in which {m=1}; however it does establish a rigorous barrier to any attempt to prove global regularity for the scalar NLS equation, in that such an attempt needs to crucially use some property of the scalar NLS that is not shared by the more general systems in (1). For instance, any approach that is primarily based on the conservation laws of mass, momentum, and energy (which are common to both (1) and (2)) will not be sufficient to establish global regularity of supercritical defocusing scalar NLS.

The method of proof in this paper is broadly similar to that in the previous paper for NLW, but with a number of additional technical complications. Both proofs begin by reducing matters to constructing a discretely self-similar solution. In the case of NLW, this solution lived on a forward light cone {\{ (t,x): |x| \leq t \}} and obeyed a self-similarity

\displaystyle  u(2t, 2x) = 2^{-\frac{2}{p-1}} u(t,x).

The ability to restrict to a light cone arose from the finite speed of propagation properties of NLW. For NLS, the solution will instead live on the domain

\displaystyle  H_d := ([0,+\infty) \times {\bf R}^d) \backslash \{(0,0)\}

and obey a parabolic self-similarity

\displaystyle  u(4t, 2x) = 2^{-\frac{2}{p-1}} u(t,x)

and solve the homogeneous version {G=0} of (1). (The inhomogeneity {G} emerges when one truncates the self-similar solution so that the initial data is compactly supported in space.) A key technical point is that {u} has to be smooth everywhere in {H_d}, including the boundary component {\{ (0,x): x \in {\bf R}^d \backslash \{0\}\}}. This unfortunately rules out many of the existing constructions of self-similar solutions, which typically will have some sort of singularity at the spatial origin.

The remaining steps of the argument can broadly be described as quantifier elimination: one systematically eliminates each of the degrees of freedom of the problem in turn by locating the necessary and sufficient conditions required of the remaining degrees of freedom in order for the constraints of a particular degree of freedom to be satisfiable. The first such degree of freedom to eliminate is the potential function {F}. The task here is to determine what constraints must exist on a putative solution {u} in order for there to exist a (positive, homogeneous, smooth away from origin) potential {F} obeying the homogeneous NLS equation

\displaystyle  i \partial_t u + \Delta u = (\nabla F)(u).

Firstly, the requirement that {F} be homogeneous implies the Euler identity

\displaystyle  \langle (\nabla F)(u), u \rangle = (p+1) F(u)

(where {\langle,\rangle} denotes the standard real inner product on {{\bf C}^m}), while the requirement that {F} be phase invariant similarly yields the variant identity

\displaystyle  \langle (\nabla F)(u), iu \rangle = 0,

so if one defines the potential energy field to be {V = F(u)}, we obtain from the chain rule the equations

\displaystyle  \langle i \partial_t u + \Delta u, u \rangle = (p+1) V

\displaystyle  \langle i \partial_t u + \Delta u, iu \rangle = 0

\displaystyle  \langle i \partial_t u + \Delta u, \partial_t u \rangle = \partial_t V

\displaystyle  \langle i \partial_t u + \Delta u, \partial_{x_j} u \rangle = \partial_{x_j} V.

Conversely, it turns out (roughly speaking) that if one can locate fields {u} and {V} obeying the above equations (as well as some other technical regularity and non-degeneracy conditions), then one can find an {F} with all the required properties. The first of these equations can be thought of as a definition of the potential energy field {V}, and the other three equations are basically disguised versions of the conservation laws of mass, energy, and momentum respectively. The construction of {F} relies on a classical extension theorem of Seeley that is a relative of the Whitney extension theorem.

Now that the potential {F} is eliminated, the next degree of freedom to eliminate is the solution field {u}. One can observe that the above equations involving {u} and {V} can be expressed instead in terms of {V} and the Gram-type matrix {G[u,u]} of {u}, which is a {(2d+4) \times (2d+4)} matrix consisting of the inner products {\langle D_1 u, D_2 u \rangle} where {D_1,D_2} range amongst the {2d+4} differential operators

\displaystyle  D_1,D_2 \in \{ 1, i, \partial_t, i\partial_t, \partial_{x_1},\dots,\partial_{x_d}, i\partial_{x_1}, \dots, i\partial_{x_d}\}.

To eliminate {u}, one thus needs to answer the question of what properties are required of a {(2d+4) \times (2d+4)} matrix {G} for it to be the Gram-type matrix {G = G[u,u]} of a field {u}. Amongst some obvious necessary conditions are that {G} needs to be symmetric and positive semi-definite; there are also additional constraints coming from identities such as

\displaystyle  \partial_t \langle u, u \rangle = 2 \langle u, \partial_t u \rangle

\displaystyle  \langle i u, \partial_t u \rangle = - \langle u, i \partial_t u \rangle


\displaystyle  \partial_{x_j} \langle iu, \partial_{x_k} u \rangle - \partial_{x_k} \langle iu, \partial_{x_j} u \rangle = 2 \langle i \partial_{x_j} u, \partial_{x_k} u \rangle.

Ideally one would like a theorem that asserts (for {m} large enough) that as long as {G} obeys all of the “obvious” constraints, then there exists a suitably non-degenerate map {u} such that {G = G[u,u]}. In the case of NLW, the analogous claim was basically a consequence of the Nash embedding theorem (which can be viewed as a theorem about the solvability of the system of equations {\langle \partial_{x_j} u, \partial_{x_k} u \rangle = g_{jk}} for a given positive definite symmetric set of fields {g_{jk}}). However, the presence of the complex structure in the NLS case poses some significant technical challenges (note for instance that the naive complex version of the Nash embedding theorem is false, due to obstructions such as Liouville’s theorem that prevent a compact complex manifold from being embeddable holomorphically in {{\bf C}^m}). Nevertheless, by adapting the proof of the Nash embedding theorem (in particular, the simplified proof of Gunther that avoids the need to use the Nash-Moser iteration scheme) we were able to obtain a partial complex analogue of the Nash embedding theorem that sufficed for our application; it required an artificial additional “curl-free” hypothesis on the Gram-type matrix {G[u,u]}, but fortunately this hypothesis ends up being automatic in our construction. Also, this version of the Nash embedding theorem is unable to prescribe the component {\langle \partial_t u, \partial_t u \rangle} of the Gram-type matrix {G[u,u]}, but fortunately this component is not used in any of the conservation laws and so the loss of this component does not cause any difficulty.

After applying the above-mentioned Nash-embedding theorem, the task is now to locate a matrix {G} obeying all the hypotheses of that theorem, as well as the conservation laws for mass, momentum, and energy (after defining the potential energy field {V} in terms of {G}). This is quite a lot of fields and constraints, but one can cut down significantly on the degrees of freedom by requiring that {G} is spherically symmetric (in a tensorial sense) and also continuously self-similar (not just discretely self-similar). Note that this hypothesis is weaker than the assertion that the original field {u} is spherically symmetric and continuously self-similar; indeed we do not know if non-trivial solutions of this type actually exist. These symmetry hypotheses reduce the number of independent components of the {(2d+4) \times (2d+4)} matrix {G} to just six: {g_{1,1}, g_{1,i\partial_t}, g_{1,i\partial_r}, g_{\partial_r, \partial_r}, g_{\partial_\omega, \partial_\omega}, g_{\partial_r, \partial_t}}, which now take as their domain the {1+1}-dimensional space

\displaystyle  H_1 := ([0,+\infty) \times {\bf R}) \backslash \{(0,0)\}.

One now has to construct these six fields, together with a potential energy field {v}, that obey a number of constraints, notably some positive definiteness constraints as well as the aforementioned conservation laws for mass, momentum, and energy.

The field {g_{1,i\partial_t}} only arises in the equation for the potential {v} (coming from Euler’s identity) and can easily be eliminated. Similarly, the field {g_{\partial_r,\partial_t}} only makes an appearance in the current of the energy conservation law, and so can also be easily eliminated so long as the total energy is conserved. But in the energy-supercritical case, the total energy is infinite, and so it is relatively easy to eliminate the field {g_{\partial_r, \partial_t}} from the problem also. This leaves us with the task of constructing just five fields {g_{1,1}, g_{1,i\partial_r}, g_{\partial_r,\partial_r}, g_{\partial_\omega,\partial_\omega}, v} obeying a number of positivity conditions, symmetry conditions, regularity conditions, and conservation laws for mass and momentum.

The potential field {v} can effectively be absorbed into the angular stress field {g_{\partial_\omega,\partial_\omega}} (after placing an appropriate counterbalancing term in the radial stress field {g_{\partial_r, \partial_r}} so as not to disrupt the conservation laws), so we can also eliminate this field. The angular stress field {g_{\partial_\omega, \partial_\omega}} is then only constrained through the momentum conservation law and a requirement of positivity; one can then eliminate this field by converting the momentum conservation law from an equality to an inequality. Finally, the radial stress field {g_{\partial_r, \partial_r}} is also only constrained through a positive definiteness constraint and the momentum conservation inequality, so it can also be eliminated from the problem after some further modification of the momentum conservation inequality.

The task then reduces to locating just two fields {g_{1,1}, g_{1,i\partial_r}} that obey a mass conservation law

\displaystyle  \partial_t g_{1,1} = 2 \left(\partial_r + \frac{d-1}{r} \right) g_{1,i\partial r}

together with an additional inequality that is the remnant of the momentum conservation law. One can solve for the mass conservation law in terms of a single scalar field {W} using the ansatz

\displaystyle g_{1,1} = 2 r^{1-d} \partial_r (r^d W)

\displaystyle g_{1,i\partial_r} = r^{1-d} \partial_t (r^d W)

so the problem has finally been simplified to the task of locating a single scalar field {W} with some scaling and homogeneity properties that obeys a certain differential inequality relating to momentum conservation. This turns out to be possible by explicitly writing down a specific scalar field {W} using some asymptotic parameters and cutoff functions.

[This guest post is authored by Caroline Series.]

The Chern Medal is a relatively new prize, awarded once every four years jointly by the IMU
and the Chern Medal Foundation (CMF) to an individual whose accomplishments warrant
the highest level of recognition for outstanding achievements in the field of mathematics.
Funded by the CMF, the Medalist receives a cash prize of US$ 250,000.  In addition, each
Medalist may nominate one or more organizations to receive funding totalling US$ 250,000, for the support of research, education, or other outreach programs in the field of mathematics.

Professor Chern devoted his life to mathematics, both in active research and education, and in nurturing the field whenever the opportunity arose. He obtained fundamental results in all the major aspects of modern geometry and founded the area of global differential geometry. Chern exhibited keen aesthetic tastes in his selection of problems, and the breadth of his work deepened the connections of geometry with different areas of mathematics. He was also generous during his lifetime in his personal support of the field.

Nominations should be sent to the Prize Committee Chair:  Caroline Series, email: by 31st December 2016. Further details and nomination guidelines for this and the other IMU prizes can be found at


I’ve just uploaded to the arXiv my paper “An integration approach to the Toeplitz square peg problem“, submitted to Forum of Mathematics, Sigma. This paper resulted from my attempts recently to solve the Toeplitz square peg problem (also known as the inscribed square problem):

Conjecture 1 (Toeplitz square peg problem) Let {\gamma} be a simple closed curve in the plane. Is it necessarily the case that {\gamma} contains four vertices of a square?

See this recent survey of Matschke in the Notices of the AMS for the latest results on this problem.

The route I took to the results in this paper was somewhat convoluted. I was motivated to look at this problem after lecturing recently on the Jordan curve theorem in my class. The problem is superficially similar to the Jordan curve theorem in that the result is known (and rather easy to prove) if {\gamma} is sufficiently regular (e.g. if it is a polygonal path), but seems to be significantly more difficult when the curve is merely assumed to be continuous. Roughly speaking, all the known positive results on the problem have proceeded using (in some form or another) tools from homology: note for instance that one can view the conjecture as asking whether the four-dimensional subset {\gamma^4} of the eight-dimensional space {({\bf R}^2)^4} necessarily intersects the four-dimensional space {\mathtt{Squares} \subset ({\bf R}^2)^4} consisting of the quadruples {(v_1,v_2,v_3,v_4)} traversing a square in (say) anti-clockwise order; this space is a four-dimensional linear subspace of {({\bf R}^2)^4}, with a two-dimensional subspace of “degenerate” squares {(v,v,v,v)} removed. If one ignores this degenerate subspace, one can use intersection theory to conclude (under reasonable “transversality” hypotheses) that {\gamma^4} intersects {\mathtt{Squares}} an odd number of times (up to the cyclic symmetries of the square), which is basically how Conjecture 1 is proven in the regular case. Unfortunately, if one then takes a limit and considers what happens when {\gamma} is just a continuous curve, the odd number of squares created by these homological arguments could conceivably all degenerate to points, thus blocking one from proving the conjecture in the general case.

Inspired by my previous work on finite time blowup for various PDEs, I first tried looking for a counterexample in the category of (locally) self-similar curves that are smooth (or piecewise linear) away from a single origin where it can oscillate infinitely often; this is basically the smoothest type of curve that was not already covered by previous results. By a rescaling and compactness argument, it is not difficult to see that such a counterexample would exist if there was a counterexample to the following periodic version of the conjecture:

Conjecture 2 (Periodic square peg problem) Let {\gamma_1, \gamma_2} be two disjoint simple closed piecewise linear curves in the cylinder {({\bf R}/{\bf Z}) \times {\bf R}} which have a winding number of one, that is to say they are homologous to the loop {x \mapsto (x,0)} from {{\bf R}/{\bf Z}} to {({\bf R}/{\bf Z}) \times {\bf R}}. Then the union of {\gamma_1} and {\gamma_2} contains the four vertices of a square.

In contrast to Conjecture 1, which is known for polygonal paths, Conjecture 2 is still open even under the hypothesis of polygonal paths; the homological arguments alluded to previously now show that the number of inscribed squares in the periodic setting is even rather than odd, which is not enough to conclude the conjecture. (This flipping of parity from odd to even due to an infinite amount of oscillation is reminiscent of the “Eilenberg-Mazur swindle“, discussed in this previous post.)

I therefore tried to construct counterexamples to Conjecture 2. I began perturbatively, looking at curves {\gamma_1, \gamma_2} that were small perturbations of constant functions. After some initial Taylor expansion, I was blocked from forming such a counterexample because an inspection of the leading Taylor coefficients required one to construct a continuous periodic function of mean zero that never vanished, which of course was impossible by the intermediate value theorem. I kept expanding to higher and higher order to try to evade this obstruction (this, incidentally, was when I discovered this cute application of Lagrange reversion) but no matter how high an accuracy I went (I think I ended up expanding to sixth order in a perturbative parameter {\varepsilon} before figuring out what was going on!), this obstruction kept resurfacing again and again. I eventually figured out that this obstruction was being caused by a “conserved integral of motion” for both Conjecture 2 and Conjecture 1, which can in fact be used to largely rule out perturbative constructions. This yielded a new positive result for both conjectures:

Theorem 3

  • (i) Conjecture 1 holds when {\gamma} is the union {\{ (t,f(t)): t \in [t_0,t_1]\} \cup \{ (t,g(t)): t \in [t_0,t_1]\}} of the graphs of two Lipschitz functions {f,g: [t_0,t_1] \rightarrow {\bf R}} of Lipschitz constant less than one that agree at the endpoints.
  • (ii) Conjecture 2 holds when {\gamma_1, \gamma_2} are graphs of Lipschitz functions {f: {\bf R}/{\bf Z} \rightarrow {\bf R}, g: {\bf R}/{\bf Z} \rightarrow {\bf R}} of Lipschitz constant less than one.

We sketch the proof of Theorem 3(i) as follows (the proof of Theorem 3(ii) is very similar). Let {\gamma_1: [t_0, t_1] \rightarrow {\bf R}} be the curve {\gamma_1(t) := (t,f(t))}, thus {\gamma_1} traverses one of the two graphs that comprise {\gamma}. For each time {t \in [t_0,t_1]}, there is a unique square with first vertex {\gamma_1(t)} (and the other three vertices, traversed in anticlockwise order, denoted {\gamma_2(t), \gamma_3(t), \gamma_4(t)}) such that {\gamma_2(t)} also lies in the graph of {f} and {\gamma_4(t)} also lies in the graph of {g} (actually for technical reasons we have to extend {f,g} by constants to all of {{\bf R}} in order for this claim to be true). To see this, we simply rotate the graph of {g} clockwise by {\frac{\pi}{2}} around {\gamma_1(t)}, where (by the Lipschitz hypotheses) it must hit the graph of {f} in a unique point, which is {\gamma_2(t)}, and which then determines the other two vertices {\gamma_3(t), \gamma_4(t)} of the square. The curve {\gamma_3(t)} has the same starting and ending point as the graph of {f} or {g}; using the Lipschitz hypothesis one can show this graph is simple. If the curve ever hits the graph of {g} other than at the endpoints, we have created an inscribed square, so we may assume for contradiction that {\gamma_3(t)} avoids the graph of {g}, and hence by the Jordan curve theorem the two curves enclose some non-empty bounded open region {\Omega}.

Now for the conserved integral of motion. If we integrate the {1}-form {y\ dx} on each of the four curves {\gamma_1, \gamma_2, \gamma_3, \gamma_4}, we obtain the identity

\displaystyle  \int_{\gamma_1} y\ dx - \int_{\gamma_2} y\ dx + \int_{\gamma_3} y\ dx - \int_{\gamma_4} y\ dx = 0.

This identity can be established by the following calculation: one can parameterise

\displaystyle  \gamma_1(t) = (x(t), y(t))

\displaystyle  \gamma_2(t) = (x(t)+a(t), y(t)+b(t))

\displaystyle  \gamma_3(t) = (x(t)+a(t)-b(t), y(t)+a(t)+b(t))

\displaystyle  \gamma_4(t) = (x(t)-b(t), y(t)+a(t))

for some Lipschitz functions {x,y,a,b: [t_0,t_1] \rightarrow {\bf R}}; thus for instance {\int_{\gamma_1} y\ dx = \int_{t_0}^{t_1} y(t)\ dx(t)}. Inserting these parameterisations and doing some canceling, one can write the above integral as

\displaystyle  \int_{t_0}^{t_1} d \frac{a(t)^2-b(t)^2}{2}

which vanishes because {a(t), b(t)} (which represent the sidelengths of the squares determined by {\gamma_1(t), \gamma_2(t), \gamma_3(t), \gamma_4(t)} vanish at the endpoints {t=t_0,t_1}.

Using this conserved integral of motion, one can show that

\displaystyle  \int_{\gamma_3} y\ dx = \int_{t_0}^{t_1} g(t)\ dt

which by Stokes’ theorem then implies that the bounded open region {\Omega} mentioned previously has zero area, which is absurd.

This argument hinged on the curve {\gamma_3} being simple, so that the Jordan curve theorem could apply. Once one left the perturbative regime of curves of small Lipschitz constant, it became possible for {\gamma_3} to be self-crossing, but nevertheless there still seemed to be some sort of integral obstruction. I eventually isolated the problem in the form of a strengthened version of Conjecture 2:

Conjecture 4 (Area formulation of square peg problem) Let {\gamma_1, \gamma_2, \gamma_3, \gamma_4: {\bf R}/{\bf Z} \rightarrow ({\bf R}/{\bf Z}) \times {\bf R}} be simple closed piecewise linear curves of winding number {1} obeying the area identity

\displaystyle  \int_{\gamma_1} y\ dx - \int_{\gamma_2} y\ dx + \int_{\gamma_3} y\ dx - \int_{\gamma_4} y\ dx = 0

(note the {1}-form {y\ dx} is still well defined on the cylinder {({\bf R}/{\bf Z}) \times {\bf R}}; note also that the curves {\gamma_1,\gamma_2,\gamma_3,\gamma_4} are allowed to cross each other.) Then there exists a (possibly degenerate) square with vertices (traversed in anticlockwise order) lying on {\gamma_1, \gamma_2, \gamma_3, \gamma_4} respectively.

It is not difficult to see that Conjecture 4 implies Conjecture 2. Actually I believe that the converse implication is at least morally true, in that any counterexample to Conjecture 4 can be eventually transformed to a counterexample to Conjecture 2 and Conjecture 1. The conserved integral of motion argument can establish Conjecture 4 in many cases, for instance if {\gamma_2,\gamma_4} are graphs of functions of Lipschitz constant less than one.

Conjecture 4 has a model special case, when one of the {\gamma_i} is assumed to just be a horizontal loop. In this case, the problem collapses to that of producing an intersection between two three-dimensional subsets of a six-dimensional space, rather than to four-dimensional subsets of an eight-dimensional space. More precisely, some elementary transformations reveal that this special case of Conjecture 4 can be formulated in the following fashion in which the geometric notion of a square is replaced by the additive notion of a triple of real numbers summing to zero:

Conjecture 5 (Special case of area formulation) Let {\gamma_1, \gamma_2, \gamma_3: {\bf R}/{\bf Z} \rightarrow ({\bf R}/{\bf Z}) \times {\bf R}} be simple closed piecewise linear curves of winding number {1} obeying the area identity

\displaystyle  \int_{\gamma_1} y\ dx + \int_{\gamma_2} y\ dx + \int_{\gamma_3} y\ dx = 0.

Then there exist {x \in {\bf R}/{\bf Z}} and {y_1,y_2,y_3 \in {\bf R}} with {y_1+y_2+y_3=0} such that {(x,y_i) \in \gamma_i} for {i=1,2,3}.

This conjecture is easy to establish if one of the curves, say {\gamma_3}, is the graph {\{ (t,f(t)): t \in {\bf R}/{\bf Z}\}} of some piecewise linear function {f: {\bf R}/{\bf Z} \rightarrow {\bf R}}, since in that case the curve {\gamma_1} and the curve {\tilde \gamma_2 := \{ (x, -y-f(x)): (x,y) \in \gamma_2 \}} enclose the same area in the sense that {\int_{\gamma_1} y\ dx = \int_{\tilde \gamma_2} y\ dx}, and hence must intersect by the Jordan curve theorem (otherwise they would enclose a non-zero amount of area between them), giving the claim. But when none of the {\gamma_1,\gamma_2,\gamma_3} are graphs, the situation becomes combinatorially more complicated.

Using some elementary homological arguments (e.g. breaking up closed {1}-cycles into closed paths) and working with a generic horizontal slice of the curves, I was able to show that Conjecture 5 was equivalent to a one-dimensional problem that was largely combinatorial in nature, revolving around the sign patterns of various triple sums {y_{1,a} + y_{2,b} + y_{3,c}} with {y_{1,a}, y_{2,b}, y_{3,c}} drawn from various finite sets of reals.

Conjecture 6 (Combinatorial form) Let {k_1,k_2,k_3} be odd natural numbers, and for each {i=1,2,3}, let {y_{i,1},\dots,y_{i,k_i}} be distinct real numbers; we adopt the convention that {y_{i,0}=y_{i,k_i+1}=-\infty}. Assume the following axioms:

  • (i) For any {1 \leq p \leq k_1, 1 \leq q \leq k_2, 1 \leq r \leq k_3}, the sums {y_{1,p} + y_{2,q} + y_{3,r}} are non-zero.
  • (ii) (Non-crossing) For any {i=1,2,3} and {0 \leq p < q \leq k_i} with the same parity, the pairs {\{ y_{i,p}, y_{i,p+1}\}} and {\{y_{i,q}, y_{i,q+1}\}} are non-crossing in the sense that

    \displaystyle  \sum_{a \in \{p,p+1\}} \sum_{b \in \{q,q+1\}} (-1)^{a+b} \mathrm{sgn}( y_{i,a} - y_{i,b} ) = 0.

  • (iii) (Non-crossing sums) For any {0 \leq p \leq k_1}, {0 \leq q \leq k_2}, {0 \leq r \leq k_3} of the same parity, one has

    \displaystyle  \sum_{a \in \{p,p+1\}} \sum_{b \in \{q,q+1\}} \sum_{c \in \{r,r+1\}} (-1)^{a+b+c} \mathrm{sgn}( y_{1,a} + y_{2,b} + y_{3,c} ) = 0.

Then one has

\displaystyle  \sum_{i=1}^3 \sum_{p=1}^{k_i} (-1)^{p-1} y_{i,p} < 0.

Roughly speaking, Conjecture 6 and Conjecture 5 are connected by constructing curves {\gamma_i} to connect {(0, y_{i,p})} to {(0,y_{i,p+1})} for {0 \leq p \leq k+1} by various paths, which either lie to the right of the {y} axis (when {p} is odd) or to the left of the {y} axis (when {p} is even). The axiom (ii) is asserting that the numbers {-\infty, y_{i,1},\dots,y_{i,k_i}} are ordered according to the permutation of a meander (formed by gluing together two non-crossing perfect matchings).

Using various ad hoc arguments involving “winding numbers”, it is possible to prove this conjecture in many cases (e.g. if one of the {k_i} is at most {3}), to the extent that I have now become confident that this conjecture is true (and have now come full circle from trying to disprove Conjecture 1 to now believing that this conjecture holds also). But it seems that there is some non-trivial combinatorial argument to be made if one is to prove this conjecture; purely homological arguments seem to partially resolve the problem, but are not sufficient by themselves.

While I was not able to resolve the square peg problem, I think these results do provide a roadmap to attacking it, first by focusing on the combinatorial conjecture in Conjecture 6 (or its equivalent form in Conjecture 5), then after that is resolved moving on to Conjecture 4, and then finally to Conjecture 1.

By an odd coincidence, I stumbled upon a second question in as many weeks about power series, and once again the only way I know how to prove the result is by complex methods; once again, I am leaving it here as a challenge to any interested readers, and I would be particularly interested in knowing of a proof that was not based on complex analysis (or thinly disguised versions thereof), or for a reference to previous literature where something like this identity has occured. (I suspect for instance that something like this may have shown up before in free probability, based on the answer to part (ii) of the problem.)

Here is a purely algebraic form of the problem:

Problem 1 Let {F = F(z)} be a formal function of one variable {z}. Suppose that {G = G(z)} is the formal function defined by

\displaystyle G := \sum_{n=1}^\infty \left( \frac{F^n}{n!} \right)^{(n-1)}

\displaystyle = F + \left(\frac{F^2}{2}\right)' + \left(\frac{F^3}{6}\right)'' + \dots

\displaystyle = F + FF' + (F (F')^2 + \frac{1}{2} F^2 F'') + \dots,

where we use {f^{(k)}} to denote the {k}-fold derivative of {f} with respect to the variable {z}.

  • (i) Show that {F} can be formally recovered from {G} by the formula

    \displaystyle F = \sum_{n=1}^\infty (-1)^{n-1} \left( \frac{G^n}{n!} \right)^{(n-1)}

    \displaystyle = G - \left(\frac{G^2}{2}\right)' + \left(\frac{G^3}{6}\right)'' - \dots

    \displaystyle = G - GG' + (G (G')^2 + \frac{1}{2} G^2 G'') - \dots.

  • (ii) There is a remarkable further formal identity relating {F(z)} with {G(z)} that does not explicitly involve any infinite summation. What is this identity?

To rigorously formulate part (i) of this problem, one could work in the commutative differential ring of formal infinite series generated by polynomial combinations of {F} and its derivatives (with no constant term). Part (ii) is a bit trickier to formulate in this abstract ring; the identity in question is easier to state if {F, G} are formal power series, or (even better) convergent power series, as it involves operations such as composition or inversion that can be more easily defined in those latter settings.

To illustrate Problem 1(i), let us compute up to third order in {F}, using {{\mathcal O}(F^4)} to denote any quantity involving four or more factors of {F} and its derivatives, and similarly for other exponents than {4}. Then we have

\displaystyle G = F + FF' + (F (F')^2 + \frac{1}{2} F^2 F'') + {\mathcal O}(F^4)

and hence

\displaystyle G' = F' + (F')^2 + FF'' + {\mathcal O}(F^3)

\displaystyle G'' = F'' + {\mathcal O}(F^2);

multiplying, we have

\displaystyle GG' = FF' + F (F')^2 + F^2 F'' + F (F')^2 + {\mathcal O}(F^4)


\displaystyle G (G')^2 + \frac{1}{2} G^2 G'' = F (F')^2 + \frac{1}{2} F^2 F'' + {\mathcal O}(F^4)

and hence after a lot of canceling

\displaystyle G - GG' + (G (G')^2 + \frac{1}{2} G^2 G'') = F + {\mathcal O}(F^4).

Thus Problem 1(i) holds up to errors of {{\mathcal O}(F^4)} at least. In principle one can continue verifying Problem 1(i) to increasingly high order in {F}, but the computations rapidly become quite lengthy, and I do not know of a direct way to ensure that one always obtains the required cancellation at the end of the computation.

Problem 1(i) can also be posed in formal power series: if

\displaystyle F(z) = a_1 z + a_2 z^2 + a_3 z^3 + \dots

is a formal power series with no constant term with complex coefficients {a_1, a_2, \dots} with {|a_1|<1}, then one can verify that the series

\displaystyle G := \sum_{n=1}^\infty \left( \frac{F^n}{n!} \right)^{(n-1)}

makes sense as a formal power series with no constant term, thus

\displaystyle G(z) = b_1 z + b_2 z^2 + b_3 z^3 + \dots.

For instance it is not difficult to show that {b_1 = \frac{a_1}{1-a_1}}. If one further has {|b_1| < 1}, then it turns out that

\displaystyle F = \sum_{n=1}^\infty (-1)^{n-1} \left( \frac{G^n}{n!} \right)^{(n-1)}

as formal power series. Currently the only way I know how to show this is by first proving the claim for power series with a positive radius of convergence using the Cauchy integral formula, but even this is a bit tricky unless one has managed to guess the identity in (ii) first. (In fact, the way I discovered this problem was by first trying to solve (a variant of) the identity in (ii) by Taylor expansion in the course of attacking another problem, and obtaining the transform in Problem 1 as a consequence.)

The transform that takes {F} to {G} resembles both the exponential function

\displaystyle \exp(F) = \sum_{n=0}^\infty \frac{F^n}{n!}

and Taylor’s formula

\displaystyle F(z) = \sum_{n=0}^\infty \frac{F^{(n)}(0)}{n!} z^n

but does not seem to be directly connected to either (this is more apparent once one knows the identity in (ii)).

In the previous set of notes we introduced the notion of a complex diffeomorphism {f: U \rightarrow V} between two open subsets {U,V} of the complex plane {{\bf C}} (or more generally, two Riemann surfaces): an invertible holomorphic map whose inverse was also holomorphic. (Actually, the last part is automatic, thanks to Exercise 40 of Notes 4.) Such maps are also known as biholomorphic maps or conformal maps (although in some literature the notion of “conformal map” is expanded to permit maps such as the complex conjugation map {z \mapsto \overline{z}} that are angle-preserving but not orientation-preserving, as well as maps such as the exponential map {z \mapsto \exp(z)} from {{\bf C}} to {{\bf C} \backslash \{0\}} that are only locally injective rather than globally injective). Such complex diffeomorphisms can be used in complex analysis (or in the analysis of harmonic functions) to change the underlying domain {U} to a domain that may be more convenient for calculations, thanks to the following basic lemma:

Lemma 1 (Holomorphicity and harmonicity are conformal invariants) Let {\phi: U \rightarrow V} be a complex diffeomorphism between two Riemann surfaces {U,V}.

  • (i) If {f: V \rightarrow W} is a function to another Riemann surface {W}, then {f} is holomorphic if and only if {f \circ \phi: U \rightarrow W} is holomorphic.
  • (ii) If {U,V} are open subsets of {{\bf C}} and {u: V \rightarrow {\bf R}} is a function, then {u} is harmonic if and only if {u \circ \phi: U \rightarrow {\bf R}} is harmonic.

Proof: Part (i) is immediate since the composition of two holomorphic functions is holomorphic. For part (ii), observe that if {u: V \rightarrow {\bf R}} is harmonic then on any ball {B(z_0,r)} in {V}, {u} is the real part of some holomorphic function {f: B(z_0,r) \rightarrow {\bf C}} thanks to Exercise 62 of Notes 3. By part (i), {f \circ \phi: B(z_0,r) \rightarrow {\bf C}} is also holomorphic. Taking real parts we see that {u \circ \phi} is harmonic on each ball {B(z_0,r)} in {V}, and hence harmonic on all of {V}, giving one direction of (ii); the other direction is proven similarly. \Box

Exercise 2 Establish Lemma 1(ii) by direct calculation, avoiding the use of holomorphic functions. (Hint: the calculations are cleanest if one uses Wirtinger derivatives, as per Exercise 27 of Notes 1.)

Exercise 3 Let {\phi: U \rightarrow V} be a complex diffeomorphism between two open subsets {U,V} of {{\bf C}}, let {z_0} be a point in {U}, let {m} be a natural number, and let {f: V \rightarrow {\bf C} \cup \{\infty\}} be holomorphic. Show that {f: V \rightarrow {\bf C} \cup \{\infty\}} has a zero (resp. a pole) of order {m} at {\phi(z_0)} if and only if {f \circ \phi: U \rightarrow {\bf C} \cup \{\infty\}} has a zero (resp. a pole) of order {m} at {z_0}.

From Lemma 1(ii) we can now define the notion of a harmonic function {u: M \rightarrow {\bf R}} on a Riemann surface {M}; such a function {u} is harmonic if, for every coordinate chart {\phi_\alpha: U_\alpha \rightarrow V_\alpha} in some atlas, the map {u \circ \phi_\alpha^{-1}: V_\alpha \rightarrow {\bf R}} is harmonic. Lemma 1(ii) ensures that this definition of harmonicity does not depend on the choice of atlas. Similarly, using Exercise 3 one can define what it means for a holomorphic map {f: M \rightarrow {\bf C} \cup \{\infty\}} on a Riemann surface {M} to have a pole or zero of a given order at a point {p_0 \in M}, with the definition being independent of the choice of atlas.

In view of Lemma 1, it is thus natural to ask which Riemann surfaces are complex diffeomorphic to each other, and more generally to understand the space of holomorphic maps from one given Riemann surface to another. We will initially focus attention on three important model Riemann surfaces:

  • (i) (Elliptic model) The Riemann sphere {{\bf C} \cup \{\infty\}};
  • (ii) (Parabolic model) The complex plane {{\bf C}}; and
  • (iii) (Hyperbolic model) The unit disk {D(0,1)}.

The designation of these model Riemann surfaces as elliptic, parabolic, and hyperbolic comes from Riemannian geometry, where it is natural to endow each of these surfaces with a constant curvature Riemannian metric which is positive, zero, or negative in the elliptic, parabolic, and hyperbolic cases respectively. However, we will not discuss Riemannian geometry further here.

All three model Riemann surfaces are simply connected, but none of them are complex diffeomorphic to any other; indeed, there are no non-constant holomorphic maps from the Riemann sphere to the plane or the disk, nor are there any non-constant holomorphic maps from the plane to the disk (although there are plenty of holomorphic maps going in the opposite directions). The complex automorphisms (that is, the complex diffeomorphisms from a surface to itself) of each of the three surfaces can be classified explicitly. The automorphisms of the Riemann sphere turn out to be the Möbius transformations {z \mapsto \frac{az+b}{cz+d}} with {ad-bc \neq 0}, also known as fractional linear transformations. The automorphisms of the complex plane are the linear transformations {z \mapsto az+b} with {a \neq 0}, and the automorphisms of the disk are the fractional linear transformations of the form {z \mapsto e^{i\theta} \frac{\alpha - z}{1 - \overline{\alpha} z}} for {\theta \in {\bf R}} and {\alpha \in D(0,1)}. Holomorphic maps {f: D(0,1) \rightarrow D(0,1)} from the disk {D(0,1)} to itself that fix the origin obey a basic but incredibly important estimate known as the Schwarz lemma: they are “dominated” by the identity function {z \mapsto z} in the sense that {|f(z)| \leq |z|} for all {z \in D(0,1)}. Among other things, this lemma gives guidance to determine when a given Riemann surface is complex diffeomorphic to a disk; we shall discuss this point further below.

It is a beautiful and fundamental fact in complex analysis that these three model Riemann surfaces are in fact an exhaustive list of the simply connected Riemann surfaces, up to complex diffeomorphism. More precisely, we have the Riemann mapping theorem and the uniformisation theorem:

Theorem 4 (Riemann mapping theorem) Let {U} be a simply connected open subset of {{\bf C}} that is not all of {{\bf C}}. Then {U} is complex diffeomorphic to {D(0,1)}.

Theorem 5 (Uniformisation theorem) Let {M} be a simply connected Riemann surface. Then {M} is complex diffeomorphic to {{\bf C} \cup \{\infty\}}, {{\bf C}}, or {D(0,1)}.

As we shall see, every connected Riemann surface can be viewed as the quotient of its simply connected universal cover by a discrete group of automorphisms known as deck transformations. This in principle gives a complete classification of Riemann surfaces up to complex diffeomorphism, although the situation is still somewhat complicated in the hyperbolic case because of the wide variety of discrete groups of automorphisms available in that case.

We will prove the Riemann mapping theorem in these notes, using the elegant argument of Koebe that is based on the Schwarz lemma and Montel’s theorem (Exercise 57 of Notes 4). The uniformisation theorem is however more difficult to establish; we discuss some components of a proof (based on the Perron method of subharmonic functions) here, but stop short of providing a complete proof.

The above theorems show that it is in principle possible to conformally map various domains into model domains such as the unit disk, but the proofs of these theorems do not readily produce explicit conformal maps for this purpose. For some domains we can just write down a suitable such map. For instance:

Exercise 6 (Cayley transform) Let {{\bf H} := \{ z \in {\bf C}: \mathrm{Im} z > 0 \}} be the upper half-plane. Show that the Cayley transform {\phi: {\bf H} \rightarrow D(0,1)}, defined by

\displaystyle  \phi(z) := \frac{z-i}{z+i},

is a complex diffeomorphism from the upper half-plane {{\bf H}} to the disk {D(0,1)}, with inverse map {\phi^{-1}: D(0,1) \rightarrow {\bf H}} given by

\displaystyle  \phi^{-1}(w) := i \frac{1+w}{1-w}.

Exercise 7 Show that for any real numbers {a<b}, the strip {\{ z \in {\bf C}: a < \mathrm{Re}(z) < b \}} is complex diffeomorphic to the disk {D(0,1)}. (Hint: use the complex exponential and a linear transformation to map the strip onto the half-plane {{\bf H}}.)

Exercise 8 Show that for any real numbers {a<b<a+2\pi}, the strip {\{ re^{i\theta}: r>0, a < \theta < b \}} is complex diffeomorphic to the disk {D(0,1)}. (Hint: use a branch of either the complex logarithm, or of a complex power {z \mapsto z^\alpha}.)

We will discuss some other explicit conformal maps in this set of notes, such as the Schwarz-Christoffel maps that transform the upper half-plane {{\bf H}} to polygonal regions. Further examples of conformal mapping can be found in the text of Stein-Shakarchi.

Read the rest of this entry »

My colleague Tom Liggett recently posed to me the following problem about power series in one real variable {x}. Observe that the power series

\displaystyle  \sum_{n=0}^\infty (-1)^n\frac{x^n}{n!}

has very rapidly decaying coefficients (of order {O(1/n!)}), leading to an infinite radius of convergence; also, as the series converges to {e^{-x}}, the series decays very rapidly as {x} approaches {+\infty}. The problem is whether this is essentially the only example of this type. More precisely:

Problem 1 Let {a_0, a_1, \dots} be a bounded sequence of real numbers, and suppose that the power series

\displaystyle  f(x) := \sum_{n=0}^\infty a_n\frac{x^n}{n!}

(which has an infinite radius of convergence) decays like {O(e^{-x})} as {x \rightarrow +\infty}, in the sense that the function {e^x f(x)} remains bounded as {x \rightarrow +\infty}. Must the sequence {a_n} be of the form {a_n = C (-1)^n} for some constant {C}?

As it turns out, the problem has a very nice solution using complex analysis methods, which by coincidence I happen to be teaching right now. I am therefore posing as a challenge to my complex analysis students and to other readers of this blog to answer the above problem by complex methods; feel free to post solutions in the comments below (and in particular, if you don’t want to be spoiled, you should probably refrain from reading the comments). In fact, the only way I know how to solve this problem currently is by complex methods; I would be interested in seeing a purely real-variable solution that is not simply a thinly disguised version of a complex-variable argument.

(To be fair to my students, the complex variable argument does require one additional tool that is not directly covered in my notes. That tool can be found here.)

In the previous set of notes we saw that functions {f: U \rightarrow {\bf C}} that were holomorphic on an open set {U} enjoyed a large number of useful properties, particularly if the domain {U} was simply connected. In many situations, though, we need to consider functions {f} that are only holomorphic (or even well-defined) on most of a domain {U}, thus they are actually functions {f: U \backslash S \rightarrow {\bf C}} outside of some small singular set {S} inside {U}. (In this set of notes we only consider interior singularities; one can also discuss singular behaviour at the boundary of {U}, but this is a whole separate topic and will not be pursued here.) Since we have only defined the notion of holomorphicity on open sets, we will require the singular sets {S} to be closed, so that the domain {U \backslash S} on which {f} remains holomorphic is still open. A typical class of examples are the functions of the form {\frac{f(z)}{z-z_0}} that were already encountered in the Cauchy integral formula; if {f: U \rightarrow {\bf C}} is holomorphic and {z_0 \in U}, such a function would be holomorphic save for a singularity at {z_0}. Another basic class of examples are the rational functions {P(z)/Q(z)}, which are holomorphic outside of the zeroes of the denominator {Q}.

Singularities come in varying levels of “badness” in complex analysis. The least harmful type of singularity is the removable singularity – a point {z_0} which is an isolated singularity (i.e., an isolated point of the singular set {S}) where the function {f} is undefined, but for which one can extend the function across the singularity in such a fashion that the function becomes holomorphic in a neighbourhood of the singularity. A typical example is that of the complex sinc function {\frac{\sin(z)}{z}}, which has a removable singularity at the origin {0}, which can be removed by declaring the sinc function to equal {1} at {0}. The detection of isolated removable singularities can be accomplished by Riemann’s theorem on removable singularities (Exercise 35 from Notes 3): if a holomorphic function {f: U \backslash S \rightarrow {\bf C}} is bounded near an isolated singularity {z_0 \in S}, then the singularity at {z_0} may be removed.

After removable singularities, the mildest form of singularity one can encounter is that of a pole – an isolated singularity {z_0} such that {f(z)} can be factored as {f(z) = \frac{g(z)}{(z-z_0)^m}} for some {m \geq 1} (known as the order of the pole), where {g} has a removable singularity at {z_0} (and is non-zero at {z_0} once the singularity is removed). Such functions have already made a frequent appearance in previous notes, particularly the case of simple poles when {m=1}. The behaviour near {z_0} of function {f} with a pole of order {m} is well understood: for instance, {|f(z)|} goes to infinity as {z} approaches {z_0} (at a rate comparable to {|z-z_0|^{-m}}). These singularities are not, strictly speaking, removable; but if one compactifies the range {{\bf C}} of the holomorphic function {f: U \backslash S \rightarrow {\bf C}} to a slightly larger space {{\bf C} \cup \{\infty\}} known as the Riemann sphere, then the singularity can be removed. In particular, functions {f: U \backslash S \rightarrow {\bf C}} which only have isolated singularities that are either poles or removable can be extended to holomorphic functions {f: U \rightarrow {\bf C} \cup \{\infty\}} to the Riemann sphere. Such functions are known as meromorphic functions, and are nearly as well-behaved as holomorphic functions in many ways. In fact, in one key respect, the family of meromorphic functions is better: the meromorphic functions on {U} turn out to form a field, in particular the quotient of two meromorphic functions is again meromorphic (if the denominator is not identically zero).

Unfortunately, there are isolated singularities that are neither removable or poles, and are known as essential singularities. A typical example is the function {f(z) = e^{1/z}}, which turns out to have an essential singularity at {z=0}. The behaviour of such essential singularities is quite wild; we will show here the Casorati-Weierstrass theorem, which shows that the image of {f} near the essential singularity is dense in the complex plane, as well as the more difficult great Picard theorem which asserts that in fact the image can omit at most one point in the complex plane. Nevertheless, around any isolated singularity (even the essential ones) {z_0}, it is possible to expand {f} as a variant of a Taylor series known as a Laurent series {\sum_{n=-\infty}^\infty a_n (z-z_0)^n}. The {\frac{1}{z-z_0}} coefficient {a_{-1}} of this series is particularly important for contour integration purposes, and is known as the residue of {f} at the isolated singularity {z_0}. These residues play a central role in a common generalisation of Cauchy’s theorem and the Cauchy integral formula known as the residue theorem, which is a particularly useful tool for computing (or at least transforming) contour integrals of meromorphic functions, and has proven to be a particularly popular technique to use in analytic number theory. Within complex analysis, one important consequence of the residue theorem is the argument principle, which gives a topological (and analytical) way to control the zeroes and poles of a meromorphic function.

Finally, there are the non-isolated singularities. Little can be said about these singularities in general (for instance, the residue theorem does not directly apply in the presence of such singularities), but certain types of non-isolated singularities are still relatively easy to understand. One particularly common example of such non-isolated singularity arises when trying to invert a non-injective function, such as the complex exponential {z \mapsto \exp(z)} or a power function {z \mapsto z^n}, leading to branches of multivalued functions such as the complex logarithm {z \mapsto \log(z)} or the {n^{th}} root function {z \mapsto z^{1/n}} respectively. Such branches will typically have a non-isolated singularity along a branch cut; this branch cut can be moved around the complex domain by switching from one branch to another, but usually cannot be eliminated entirely, unless one is willing to lift up the domain {U} to a more general type of domain known as a Riemann surface. As such, one can view branch cuts as being an “artificial” form of singularity, being an artefact of a choice of local coordinates of a Riemann surface, rather than reflecting any intrinsic singularity of the function itself. The further study of Riemann surfaces is an important topic in complex analysis (as well as the related fields of complex geometry and algebraic geometry), but unfortunately this topic will probably be postponed to the next course in this sequence (which I will not be teaching).

Read the rest of this entry »

We now come to perhaps the most central theorem in complex analysis (save possibly for the fundamental theorem of calculus), namely Cauchy’s theorem, which allows one to compute (or at least transform) a large number of contour integrals {\int_\gamma f(z)\ dz} even without knowing any explicit antiderivative of {f}. There are many forms and variants of Cauchy’s theorem. To give one such version, we need the basic topological notion of a homotopy:

Definition 1 (Homotopy) Let {U} be an open subset of {{\bf C}}, and let {\gamma_0: [a,b] \rightarrow U}, {\gamma_1: [a,b] \rightarrow U} be two curves in {U}.

  • (i) If {\gamma_0, \gamma_1} have the same initial point {z_0} and final point {z_1}, we say that {\gamma_0} and {\gamma_1} are homotopic with fixed endpoints in {U} if there exists a continuous map {\gamma: [0,1] \times [a,b] \rightarrow U} such that {\gamma(0,t) = \gamma_0(t)} and {\gamma(1,t) = \gamma_1(t)} for all {t \in [a,b]}, and such that {\gamma(s,a) = z_0} and {\gamma(s,b) = z_1} for all {s \in [0,1]}.
  • (ii) If {\gamma_0, \gamma_1} are closed (but possibly with different initial points), we say that {\gamma_0} and {\gamma_1} are homotopic as closed curves in {U} if there exists a continuous map {\gamma: [0,1] \times [a,b] \rightarrow U} such that {\gamma(0,t) = \gamma_0(t)} and {\gamma(1,t) = \gamma_1(t)} for all {t \in [a,b]}, and such that {\gamma(s,a) = \gamma(s,b)} for all {s \in [0,1]}.
  • (iii) If {\gamma_2: [c,d] \rightarrow U} and {\gamma_3: [e,f] \rightarrow U} are curves with the same initial point and same final point, we say that {\gamma_2} and {\gamma_3} are homotopic with fixed endpoints up to reparameterisation in {U} if there is a reparameterisation {\tilde \gamma_2: [a,b] \rightarrow U} of {\gamma_2} which is homotopic with fixed endpoints in {U} to a reparameterisation {\tilde \gamma_3: [a,b] \rightarrow U} of {\gamma_3}.
  • (iv) If {\gamma_2: [c,d] \rightarrow U} and {\gamma_3: [e,f] \rightarrow U} are closed curves, we say that {\gamma_2} and {\gamma_3} are homotopic as closed curves up to reparameterisation in {U} if there is a reparameterisation {\tilde \gamma_2: [a,b] \rightarrow U} of {\gamma_2} which is homotopic as closed curves in {U} to a reparameterisation {\tilde \gamma_3: [a,b] \rightarrow U} of {\gamma_3}.

In the first two cases, the map {\gamma} will be referred to as a homotopy from {\gamma_0} to {\gamma_1}, and we will also say that {\gamma_0} can be continously deformed to {\gamma_1} (either with fixed endpoints, or as closed curves).

Example 2 If {U} is a convex set, that is to say that {(1-s) z_0 + s z_1 \in U} whenever {z_0,z_1 \in U} and {0 \leq s \leq 1}, then any two curves {\gamma_0, \gamma_1: [0,1] \rightarrow U} from one point {z_0} to another {z_1} are homotopic, by using the homotopy

\displaystyle \gamma(s,t) := (1-s) \gamma_0(t) + s \gamma_1(t).

For a similar reason, in a convex open set {U}, any two closed curves will be homotopic to each other as closed curves.

Exercise 3 Let {U} be an open subset of {{\bf C}}.

  • (i) Prove that the property of being homotopic with fixed endpoints in {U} is an equivalence relation.
  • (ii) Prove that the property of being homotopic as closed curves in {U} is an equivalence relation.
  • (iii) If {\gamma_0, \gamma_1: [a,b] \rightarrow U} are closed curves with the same initial point, show that {\gamma_0} is homotopic to {\gamma_1} as closed curves if and only if {\gamma_0} is homotopic to {\gamma_2 + \gamma_1 + (-\gamma_2)} with fixed endpoints for some closed curve {\gamma_2} with the same initial point as {\gamma_0} or {\gamma_1}.
  • (iv) Define a point in {U} to be a curve {\gamma_1: [a,b] \rightarrow U} of the form {\gamma_1(t) = z_0} for some {z_0 \in U} and all {t \in [a,b]}. Let {\gamma_0: [a,b] \rightarrow U} be a closed curve in {U}. Show that {\gamma_0} is homotopic with fixed endpoints to a point in {U} if and only if {\gamma_0} is homotopic as a closed curve to a point in {U}. (In either case, we will call {\gamma_0} homotopic to a point, null-homotopic, or contractible to a point in {U}.)
  • (v) If {\gamma_0, \gamma_1: [a,b] \rightarrow U} are curves with the same initial point and the same terminal point, show that {\gamma_0} is homotopic to {\gamma_1} with fixed endpoints in {U} if and only if {\gamma_0 + (-\gamma_1)} is homotopic to a point in {U}.
  • (vi) If {U} is connected, and {\gamma_0, \gamma_1: [a,b] \rightarrow U} are any two curves in {U}, show that there exists a continuous map {\gamma: [0,1] \times [a,b] \rightarrow U} such that {\gamma(0,t) = \gamma_0(t)} and {\gamma(1,t) = \gamma_1(t)} for all {t \in [a,b]}. Thus the notion of homotopy becomes rather trivial if one does not fix the endpoints or require the curve to be closed.
  • (vii) Show that if {\gamma_1: [a,b] \rightarrow U} is a reparameterisation of {\gamma_0: [a,b] \rightarrow U}, then {\gamma_0} and {\gamma_1} are homotopic with fixed endpoints in U.
  • (viii) Prove that the property of being homotopic with fixed endpoints in {U} up to reparameterisation is an equivalence relation.
  • (ix) Prove that the property of being homotopic as closed curves in {U} up to reparameterisation is an equivalence relation.

We can then phrase Cauchy’s theorem as an assertion that contour integration on holomorphic functions is a homotopy invariant. More precisely:

Theorem 4 (Cauchy’s theorem) Let {U} be an open subset of {{\bf C}}, and let {f: U \rightarrow {\bf C}} be holomorphic.

  • (i) If {\gamma_0: [a,b] \rightarrow U} and {\gamma_1: [c,d] \rightarrow U} are rectifiable curves that are homotopic in {U} with fixed endpoints up to reparameterisation, then

    \displaystyle \int_{\gamma_0} f(z)\ dz = \int_{\gamma_1} f(z)\ dz.

  • (ii) If {\gamma_0: [a,b] \rightarrow U} and {\gamma_1: [c,d] \rightarrow U} are closed rectifiable curves that are homotopic in {U} as closed curves up to reparameterisation, then

    \displaystyle \int_{\gamma_0} f(z)\ dz = \int_{\gamma_1} f(z)\ dz.

This version of Cauchy’s theorem is particularly useful for applications, as it explicitly brings into play the powerful technique of contour shifting, which allows one to compute a contour integral by replacing the contour with a homotopic contour on which the integral is easier to either compute or integrate. This formulation of Cauchy’s theorem also highlights the close relationship between contour integrals and the algebraic topology of the complex plane (and open subsets {U} thereof). Setting {\gamma_1} to be a point, we obtain an important special case of Cauchy’s theorem (which is in fact equivalent to the full theorem):

Corollary 5 (Cauchy’s theorem, again) Let {U} be an open subset of {{\bf C}}, and let {f: U \rightarrow {\bf C}} be holomorphic. Then for any closed rectifiable curve {\gamma} in {U} that is contractible in {U} to a point, one has {\int_\gamma f(z)\ dz = 0}.

Exercise 6 Show that Theorem 4 and Corollary 5 are logically equivalent.

An important feature to note about Cauchy’s theorem is the global nature of its hypothesis on {f}. The conclusion of Cauchy’s theorem only involves the values of a function {f} on the images of the two curves {\gamma_0, \gamma_1}. However, in order for the hypotheses of Cauchy’s theorem to apply, the function {f} must be holomorphic not only on the images on {\gamma_0, \gamma_1}, but on an open set {U} that is large enough (and sufficiently free of “holes”) to support a homotopy between the two curves. This point can be emphasised through the following fundamental near-counterexample to Cauchy’s theorem:

Example 7 Let {U := {\bf C} \backslash \{0\}}, and let {f: U \rightarrow {\bf C}} be the holomorphic function {f(z) := \frac{1}{z}}. Let {\gamma_{0,1,\circlearrowleft}: [0,2\pi] \rightarrow {\bf C}} be the closed unit circle contour {\gamma_{0,1,\circlearrowleft}(t) := e^{it}}. Direct calculation shows that

\displaystyle \int_{\gamma_{0,1,\circlearrowleft}} f(z)\ dz = 2\pi i \neq 0.

As a consequence of this and Cauchy’s theorem, we conclude that the contour {\gamma_{0,1,\circlearrowleft}} is not contractible to a point in {U}; note that this does not contradict Example 2 because {U} is not convex. Thus we see that the lack of holomorphicity (or singularity) of {f} at the origin can be “blamed” for the non-vanishing of the integral of {f} on the closed contour {\gamma_{0,1,\circlearrowleft}}, even though this contour does not come anywhere near the origin. Thus we see that the global behaviour of {f}, not just the behaviour in the local neighbourhood of {\gamma_{0,1,\circlearrowleft}}, has an impact on the contour integral.

One can of course rewrite this example to involve non-closed contours instead of closed ones. For instance, if we let {\gamma_0, \gamma_1: [0,\pi] \rightarrow U} denote the half-circle contours {\gamma_0(t) := e^{it}} and {\gamma_1(t) := e^{-it}}, then {\gamma_0,\gamma_1} are both contours in {U} from {+1} to {-1}, but one has

\displaystyle \int_{\gamma_0} f(z)\ dz = +\pi i


\displaystyle \int_{\gamma_1} f(z)\ dz = -\pi i.

In order for this to be consistent with Cauchy’s theorem, we conclude that {\gamma_0} and {\gamma_1} are not homotopic in {U} (even after reparameterisation).

In the specific case of functions of the form {\frac{1}{z}}, or more generally {\frac{f(z)}{z-z_0}} for some point {z_0} and some {f} that is holomorphic in some neighbourhood of {z_0}, we can quantify the precise failure of Cauchy’s theorem through the Cauchy integral formula, and through the concept of a winding number. These turn out to be extremely powerful tools for understanding both the nature of holomorphic functions and the topology of open subsets of the complex plane, as we shall see in this and later notes.

Read the rest of this entry »


RSS Google+ feed

  • An error has occurred; the feed is probably down. Try again later.