You are currently browsing the category archive for the ‘math.CV’ category.

Given a function {f: X \rightarrow Y} between two sets {X, Y}, we can form the graph

\displaystyle  \Sigma := \{ (x,f(x)): x\in X \},

which is a subset of the Cartesian product {X \times Y}.

There are a number of “closed graph theorems” in mathematics which relate the regularity properties of the function {f} with the closure properties of the graph {\Sigma}, assuming some “completeness” properties of the domain {X} and range {Y}. The most famous of these is the closed graph theorem from functional analysis, which I phrase as follows:

Theorem 1 (Closed graph theorem (functional analysis)) Let {X, Y} be complete normed vector spaces over the reals (i.e. Banach spaces). Then a function {f: X \rightarrow Y} is a continuous linear transformation if and only if the graph {\Sigma := \{ (x,f(x)): x \in X \}} is both linearly closed (i.e. it is a linear subspace of {X \times Y}) and topologically closed (i.e. closed in the product topology of {X \times Y}).

I like to think of this theorem as linking together qualitative and quantitative notions of regularity preservation properties of an operator {f}; see this blog post for further discussion.

The theorem is equivalent to the assertion that any continuous linear bijection {f: X \rightarrow Y} from one Banach space to another is necessarily an isomorphism in the sense that the inverse map is also continuous and linear. Indeed, to see that this claim implies the closed graph theorem, one applies it to the projection from {\Sigma} to {X}, which is a continuous linear bijection; conversely, to deduce this claim from the closed graph theorem, observe that the graph of the inverse {f^{-1}} is the reflection of the graph of {f}. As such, the closed graph theorem is a corollary of the open mapping theorem, which asserts that any continuous linear surjection from one Banach space to another is open. (Conversely, one can deduce the open mapping theorem from the closed graph theorem by quotienting out the kernel of the continuous surjection to get a bijection.)

It turns out that there is a closed graph theorem (or equivalent reformulations of that theorem, such as an assertion that bijective morphisms between sufficiently “complete” objects are necessarily isomorphisms, or as an open mapping theorem) in many other categories in mathematics as well. Here are some easy ones:

Theorem 2 (Closed graph theorem (linear algebra)) Let {X, Y} be vector spaces over a field {k}. Then a function {f: X \rightarrow Y} is a linear transformation if and only if the graph {\Sigma := \{ (x,f(x)): x \in X \}} is linearly closed.

Theorem 3 (Closed graph theorem (group theory)) Let {X, Y} be groups. Then a function {f: X \rightarrow Y} is a group homomorphism if and only if the graph {\Sigma := \{ (x,f(x)): x \in X \}} is closed under the group operations (i.e. it is a subgroup of {X \times Y}).

Theorem 4 (Closed graph theorem (order theory)) Let {X, Y} be totally ordered sets. Then a function {f: X \rightarrow Y} is monotone increasing if and only if the graph {\Sigma := \{ (x,f(x)): x \in X \}} is totally ordered (using the product order on {X \times Y}).

Remark 1 Similar results to the above three theorems (with similarly easy proofs) hold for other algebraic structures, such as rings (using the usual product of rings), modules, algebras, or Lie algebras, groupoids, or even categories (a map between categories is a functor iff its graph is again a category). (ADDED IN VIEW OF COMMENTS: further examples include affine spaces and {G}-sets (sets with an action of a given group {G}).) There are also various approximate versions of this theorem that are useful in arithmetic combinatorics, that relate the property of a map {f} being an “approximate homomorphism” in some sense with its graph being an “approximate group” in some sense. This is particularly useful for this subfield of mathematics because there are currently more theorems about approximate groups than about approximate homomorphisms, so that one can profitably use closed graph theorems to transfer results about the former to results about the latter.

A slightly more sophisticated result in the same vein:

Theorem 5 (Closed graph theorem (point set topology)) Let {X, Y} be compact Hausdorff spaces. Then a function {f: X \rightarrow Y} is continuous if and only if the graph {\Sigma := \{ (x,f(x)): x \in X \}} is topologically closed.

Indeed, the “only if” direction is easy, while for the “if” direction, note that if {\Sigma} is a closed subset of {X \times Y}, then it is compact Hausdorff, and the projection map from {\Sigma} to {X} is then a bijective continuous map between compact Hausdorff spaces, which is then closed, thus open, and hence a homeomorphism, giving the claim.

Note that the compactness hypothesis is necessary: for instance, the function {f: {\bf R} \rightarrow {\bf R}} defined by {f(x) := 1/x} for {x \neq 0} and {f(0)=0} for {x=0} is a function which has a closed graph, but is discontinuous.

A similar result (but relying on a much deeper theorem) is available in algebraic geometry, as I learned after asking this MathOverflow question:

Theorem 6 (Closed graph theorem (algebraic geometry)) Let {X, Y} be normal projective varieties over an algebraically closed field {k} of characteristic zero. Then a function {f: X \rightarrow Y} is a regular map if and only if the graph {\Sigma := \{ (x,f(x)): x \in X \}} is Zariski-closed.

Proof: (Sketch) For the only if direction, note that the map {x \mapsto (x,f(x))} is a regular map from the projective variety {X} to the projective variety {X \times Y} and is thus a projective morphism, hence is proper. In particular, the image {\Sigma} of {X} under this map is Zariski-closed.

Conversely, if {\Sigma} is Zariski-closed, then it is also a projective variety, and the projection {(x,y) \mapsto x} is a projective morphism from {\Sigma} to {X}, which is clearly quasi-finite; by the characteristic zero hypothesis, it is also separated. Applying (Grothendieck’s form of) Zariski’s main theorem, this projection is the composition of an open immersion and a finite map. As projective varieties are complete, the open immersion is an isomorphism, and so the projection from {\Sigma} to {X} is finite. Being injective and separable, the degree of this finite map must be one, and hence {k(\Sigma)} and {k(X)} are isomorphic, hence (by normality of {X}) {k[\Sigma]} is contained in (the image of) {k[X]}, which makes the map from {X} to {\Sigma} regular, which makes {f} regular. \Box

The counterexample of the map {f: k \rightarrow k} given by {f(x) := 1/x} for {x \neq 0} and {f(0) := 0} demonstrates why the projective hypothesis is necessary. The necessity of the normality condition (or more precisely, a weak normality condition) is demonstrated by (the projective version of) the map {(t^2,t^3) \mapsto t} from the cusipdal curve {\{ (t^2,t^3): t \in k \}} to {k}. (If one restricts attention to smooth varieties, though, normality becomes automatic.) The necessity of characteristic zero is demonstrated by (the projective version of) the inverse of the Frobenius map {x \mapsto x^p} on a field {k} of characteristic {p}.

There are also a number of closed graph theorems for topological groups, of which the following is typical (see Exercise 3 of these previous blog notes):

Theorem 7 (Closed graph theorem (topological group theory)) Let {X, Y} be {\sigma}-compact, locally compact Hausdorff groups. Then a function {X \rightarrow Y} is a continuous homomorphism if and only if the graph {\Sigma := \{ (x,f(x)): x \in X \}} is both group-theoretically closed and topologically closed.

The hypotheses of being {\sigma}-compact, locally compact, and Hausdorff can be relaxed somewhat, but I doubt that they can be eliminated entirely (though I do not have a ready counterexample for this).

In several complex variables, it is a classical theorem (see e.g. Lemma 4 of this blog post) that a holomorphic function from a domain in {{\bf C}^n} to {{\bf C}^n} is locally injective if and only if it is a local diffeomorphism (i.e. its derivative is everywhere non-singular). This leads to a closed graph theorem for complex manifolds:

Theorem 8 (Closed graph theorem (complex manifolds)) Let {X, Y} be complex manifolds. Then a function {f: X \rightarrow Y} is holomorphic if and only if the graph {\Sigma := \{ (x,f(x)): x \in X \}} is a complex manifold (using the complex structure inherited from {X \times Y}) of the same dimension as {X}.

Indeed, one applies the previous observation to the projection from {\Sigma} to {X}. The dimension requirement is needed, as can be seen from the example of the map {f: {\bf C} \rightarrow {\bf C}} defined by {f(z) =1/z} for {z \neq 0} and {f(0)=0}.

(ADDED LATER:) There is a real analogue to the above theorem:

Theorem 9 (Closed graph theorem (real manifolds)) Let {X, Y} be real manifolds. Then a function {f: X \rightarrow Y} is continuous if and only if the graph {\Sigma := \{ (x,f(x)): x \in X \}} is a real manifold of the same dimension as {X}.

This theorem can be proven by applying invariance of domain (discussed in this previous post) to the projection of {\Sigma} to {X}, to show that it is open if {\Sigma} has the same dimension as {X}.

Note though that the analogous claim for smooth real manifolds fails: the function {f: {\bf R} \rightarrow {\bf R}} defined by {f(x) := x^{1/3}} has a smooth graph, but is not itself smooth.

(ADDED YET LATER:) Here is an easy closed graph theorem in the symplectic category:

Theorem 10 (Closed graph theorem (symplectic geometry)) Let {X = (X,\omega_X)} and {Y = (Y,\omega_Y)} be smooth symplectic manifolds of the same dimension. Then a smooth map {f: X \rightarrow Y} is a symplectic morphism (i.e. {f^* \omega_Y = \omega_X}) if and only if the graph {\Sigma := \{(x,f(x)): x \in X \}} is a Lagrangian submanifold of {X \times Y} with the symplectic form {\omega_X \oplus -\omega_Y}.

In view of the symplectic rigidity phenomenon, it is likely that the smoothness hypotheses on {f,X,Y} can be relaxed substantially, but I will not try to formulate such a result here.

There are presumably many further examples of closed graph theorems (or closely related theorems, such as criteria for inverting a morphism, or open mapping type theorems) throughout mathematics; I would be interested to know of further examples.


One of the most well known problems from ancient Greek mathematics was that of trisecting an angle by straightedge and compass, which was eventually proven impossible in 1837 by Pierre Wantzel, using methods from Galois theory.

Formally, one can set up the problem as follows. Define a configuration to be a finite collection {{\mathcal C}} of points, lines, and circles in the Euclidean plane. Define a construction step to be one of the following operations to enlarge the collection {{\mathcal C}}:

  • (Straightedge) Given two distinct points {A, B} in {{\mathcal C}}, form the line {\overline{AB}} that connects {A} and {B}, and add it to {{\mathcal C}}.
  • (Compass) Given two distinct points {A, B} in {{\mathcal C}}, and given a third point {O} in {{\mathcal C}} (which may or may not equal {A} or {B}), form the circle with centre {O} and radius equal to the length {|AB|} of the line segment joining {A} and {B}, and add it to {{\mathcal C}}.
  • (Intersection) Given two distinct curves {\gamma, \gamma'} in {{\mathcal C}} (thus {\gamma} is either a line or a circle in {{\mathcal C}}, and similarly for {\gamma'}), select a point {P} that is common to both {\gamma} and {\gamma'} (there are at most two such points), and add it to {{\mathcal C}}.

We say that a point, line, or circle is constructible by straightedge and compass from a configuration {{\mathcal C}} if it can be obtained from {{\mathcal C}} after applying a finite number of construction steps.

Problem 1 (Angle trisection) Let {A, B, C} be distinct points in the plane. Is it always possible to construct by straightedge and compass from {A,B,C} a line {\ell} through {A} that trisects the angle {\angle BAC}, in the sense that the angle between {\ell} and {BA} is one third of the angle of {\angle BAC}?

Thanks to Wantzel’s result, the answer to this problem is known to be “no” in general; a generic angle {\angle BAC} cannot be trisected by straightedge and compass. (On the other hand, some special angles can certainly be trisected by straightedge and compass, such as a right angle. Also, one can certainly trisect generic angles using other methods than straightedge and compass; see the Wikipedia page on angle trisection for some examples of this.)

The impossibility of angle trisection stands in sharp contrast to the easy construction of angle bisection via straightedge and compass, which we briefly review as follows:

  1. Start with three points {A, B, C}.
  2. Form the circle {c_0} with centre {A} and radius {AB}, and intersect it with the line {\overline{AC}}. Let {D} be the point in this intersection that lies on the same side of {A} as {C}. ({D} may well be equal to {C}).
  3. Form the circle {c_1} with centre {B} and radius {AB}, and the circle {c_2} with centre {D} and radius {AB}. Let {E} be the point of intersection of {c_1} and {c_2} that is not {A}.
  4. The line {\ell := \overline{AE}} will then bisect the angle {\angle BAC}.

The key difference between angle trisection and angle bisection ultimately boils down to the following trivial number-theoretic fact:

Lemma 2 There is no power of {2} that is evenly divisible by {3}.

Proof: Obvious by modular arithmetic, by induction, or by the fundamental theorem of arithmetic. \Box

In contrast, there are of course plenty of powers of {2} that are evenly divisible by {2}, and this is ultimately why angle bisection is easy while angle trisection is hard.

The standard way in which Lemma 2 is used to demonstrate the impossibility of angle trisection is via Galois theory. The implication is quite short if one knows this theory, but quite opaque otherwise. We briefly sketch the proof of this implication here, though we will not need it in the rest of the discussion. Firstly, Lemma 2 implies the following fact about field extensions.

Corollary 3 Let {F} be a field, and let {E} be an extension of {F} that can be constructed out of {F} by a finite sequence of quadratic extensions. Then {E} does not contain any cubic extensions {K} of {F}.

Proof: If E contained a cubic extension K of F, then the dimension of E over F would be a multiple of three. On the other hand, if E is obtained from F by a tower of quadratic extensions, then the dimension of E over F is a power of two. The claim then follows from Lemma 2. \Box

To conclude the proof, one then notes that any point, line, or circle that can be constructed from a configuration {{\mathcal C}} is definable in a field obtained from the coefficients of all the objects in {{\mathcal C}} after taking a finite number of quadratic extensions, whereas a trisection of an angle {\angle ABC} will generically only be definable in a cubic extension of the field generated by the coordinates of {A,B,C}.

The Galois theory method also allows one to obtain many other impossibility results of this type, most famously the Abel-Ruffini theorem on the insolvability of the quintic equation by radicals. For this reason (and also because of the many applications of Galois theory to number theory and other branches of mathematics), the Galois theory argument is the “right” way to prove the impossibility of angle trisection within the broader framework of modern mathematics. However, this argument has the drawback that it requires one to first understand Galois theory (or at least field theory), which is usually not presented until an advanced undergraduate algebra or number theory course, whilst the angle trisection problem requires only high-school level mathematics to formulate. Even if one is allowed to “cheat” and sweep several technicalities under the rug, one still needs to possess a fair amount of solid intuition about advanced algebra in order to appreciate the proof. (This was undoubtedly be one reason why, even after Wantzel’s impossibility result was published, a large amount of effort was still expended by amateur mathematicians to try to trisect a general angle.)

In this post I would therefore like to present a different proof (or perhaps more accurately, a disguised version of the standard proof) of the impossibility of angle trisection by straightedge and compass, that avoids explicit mention of Galois theory (though it is never far beneath the surface). With “cheats”, the proof is actually quite simple and geometric (except for Lemma 2, which is still used at a crucial juncture), based on the basic geometric concept of monodromy; unfortunately, some technical work is needed however to remove these cheats.

To describe the intuitive idea of the proof, let us return to the angle bisection construction, that takes a triple {A, B, C} of points as input and returns a bisecting line {\ell} as output. We iterate the construction to create a quadrisecting line {m}, via the following sequence of steps that extend the original bisection construction:

  1. Start with three points {A, B, C}.
  2. Form the circle {c_0} with centre {A} and radius {AB}, and intersect it with the line {\overline{AC}}. Let {D} be the point in this intersection that lies on the same side of {A} as {C}. ({D} may well be equal to {C}).
  3. Form the circle {c_1} with centre {B} and radius {AB}, and the circle {c_2} with centre {D} and radius {AB}. Let {E} be the point of intersection of {c_1} and {c_2} that is not {A}.
  4. Let {F} be the point on the line {\ell := \overline{AE}} which lies on {c_0}, and is on the same side of {A} as {E}.
  5. Form the circle {c_3} with centre {F} and radius {AB}. Let {G} be the point of intersection of {c_1} and {c_3} that is not {A}.
  6. The line {m := \overline{AG}} will then quadrisect the angle {\angle BAC}.

Let us fix the points {A} and {B}, but not {C}, and view {m} (as well as intermediate objects such as {D}, {c_2}, {E}, {\ell}, {F}, {c_3}, {G}) as a function of {C}.

Let us now do the following: we begin rotating {C} counterclockwise around {A}, which drags around the other objects {D}, {c_2}, {E}, {\ell}, {F}, {c_3}, {G} that were constructed by {C} accordingly. For instance, here is an early stage of this rotation process, when the angle {\angle BAC} has become obtuse:

Now for the slightly tricky bit. We are going to keep rotating {C} beyond a half-rotation of {180^\circ}, so that {\angle BAC} now becomes a reflex angle. At this point, a singularity occurs; the point {E} collides into {A}, and so there is an instant in which the line {\ell = \overline{AE}} is not well-defined. However, this turns out to be a removable singularity (and the easiest way to demonstrate this will be to tap the power of complex analysis, as complex numbers can easily route around such a singularity), and we can blast through it to the other side, giving a picture like this:

Note that we have now deviated from the original construction in that {F} and {E} are no longer on the same side of {A}; we are thus now working in a continuation of that construction rather than with the construction itself. Nevertheless, we can still work with this continuation (much as, say, one works with analytic continuations of infinite series such as {\sum_{n=1}^\infty \frac{1}{n^s}} beyond their original domain of definition).

We now keep rotating {C} around {A}. Here, {\angle BAC} is approaching a full rotation of {360^\circ}:

When {\angle BAC} reaches a full rotation, a different singularity occurs: {c_1} and {c_2} coincide. Nevertheless, this is also a removable singularity, and we blast through to beyond a full rotation:

And now {C} is back where it started, as are {D}, {c_2}, {E}, and {\ell}… but the point {F} has moved, from one intersection point of {\ell \cap c_3} to the other. As a consequence, {c_3}, {G}, and {m} have also changed, with {m} being at right angles to where it was before. (In the jargon of modern mathematics, the quadrisection construction has a non-trivial monodromy.)

But nothing stops us from rotating {C} some more. If we continue this procedure, we see that after two full rotations of {C} around {A}, all points, lines, and circles constructed from {A, B, C} have returned to their original positions. Because of this, we shall say that the quadrisection construction described above is periodic with period {2}.

Similarly, if one performs an octisection of the angle {\angle BAC} by bisecting the quadrisection, one can verify that this octisection is periodic with period {4}; it takes four full rotations of {C} around {A} before the configuration returns to where it started. More generally, one can show

Proposition 4 Any construction of straightedge and compass from the points {A,B,C} is periodic with period equal to a power of {2}.

The reason for this, ultimately, is because any two circles or lines will intersect each other in at most two points, and so at each step of a straightedge-and-compass construction there is an ambiguity of at most {2! = 2}. Each rotation of {C} around {A} can potentially flip one of these points to the other, but then if one rotates again, the point returns to its original position, and then one can analyse the next point in the construction in the same fashion until one obtains the proposition.

But now consider a putative trisection operation, that starts with an arbitrary angle {\angle BAC} and somehow uses some sequence of straightedge and compass constructions to end up with a trisecting line {\ell}:

What is the period of this construction? If we continuously rotate {C} around {A}, we observe that a full rotations of {C} only causes the trisecting line {\ell} to rotate by a third of a full rotation (i.e. by {120^\circ}):

Because of this, we see that the period of any construction that contains {\ell} must be a multiple of {3}. But this contradicts Proposition 4 and Lemma 2.

Below the fold, I will make the above proof rigorous. Unfortunately, in doing so, I had to again leave the world of high-school mathematics, as one needs a little bit of algebraic geometry and complex analysis to resolve the issues with singularities that we saw in the above sketch. Still, I feel that at an intuitive level at least, this argument is more geometric and accessible than the Galois-theoretic argument (though anyone familiar with Galois theory will note that there is really not that much difference between the proofs, ultimately, as one has simply replaced the Galois group with a closely related monodromy group instead).

Read the rest of this entry »

The Riemann zeta function {\zeta(s)} is defined in the region {\hbox{Re}(s)>1} by the absolutely convergent series

\displaystyle  \zeta(s) = \sum_{n=1}^\infty \frac{1}{n^s} = 1 + \frac{1}{2^s} + \frac{1}{3^s} + \ldots. \ \ \ \ \ (1)

Thus, for instance, it is known that {\zeta(2)=\pi^2/6}, and thus

\displaystyle  \sum_{n=1}^\infty \frac{1}{n^2} = 1 + \frac{1}{4} + \frac{1}{9} + \ldots = \frac{\pi^2}{6}. \ \ \ \ \ (2)

For {\hbox{Re}(s) \leq 1}, the series on the right-hand side of (1) is no longer absolutely convergent, or even conditionally convergent. Nevertheless, the {\zeta} function can be extended to this region (with a pole at {s=1}) by analytic continuation. For instance, it can be shown that after analytic continuation, one has {\zeta(0) = -1/2}, {\zeta(-1) = -1/12}, and {\zeta(-2)=0}, and more generally

\displaystyle  \zeta(-s) = - \frac{B_{s+1}}{s+1} \ \ \ \ \ (3)

for {s=1,2,\ldots}, where {B_n} are the Bernoulli numbers. If one formally applies (1) at these values of {s}, one obtains the somewhat bizarre formulae

\displaystyle  \sum_{n=1}^\infty 1 = 1 + 1 + 1 + \ldots = -1/2 \ \ \ \ \ (4)

\displaystyle  \sum_{n=1}^\infty n = 1 + 2 + 3 + \ldots = -1/12 \ \ \ \ \ (5)

\displaystyle  \sum_{n=1}^\infty n^2 = 1 + 4 + 9 + \ldots = 0 \ \ \ \ \ (6)


\displaystyle  \sum_{n=1}^\infty n^s = 1 + 2^s + 3^s + \ldots = -\frac{B_{s+1}}{s+1}. \ \ \ \ \ (7)

Clearly, these formulae do not make sense if one stays within the traditional way to evaluate infinite series, and so it seems that one is forced to use the somewhat unintuitive analytic continuation interpretation of such sums to make these formulae rigorous. But as it stands, the formulae look “wrong” for several reasons. Most obviously, the summands on the left are all positive, but the right-hand sides can be zero or negative. A little more subtly, the identities do not appear to be consistent with each other. For instance, if one adds (4) to (5), one obtains

\displaystyle  \sum_{n=1}^\infty (n+1) = 2 + 3 + 4 + \ldots = -7/12 \ \ \ \ \ (8)

whereas if one subtracts {1} from (5) one obtains instead

\displaystyle  \sum_{n=2}^\infty n = 0 + 2 + 3 + 4 + \ldots = -13/12 \ \ \ \ \ (9)

and the two equations seem inconsistent with each other.

However, it is possible to interpret (4), (5), (6) by purely real-variable methods, without recourse to complex analysis methods such as analytic continuation, thus giving an “elementary” interpretation of these sums that only requires undergraduate calculus; we will later also explain how this interpretation deals with the apparent inconsistencies pointed out above.

To see this, let us first consider a convergent sum such as (2). The classical interpretation of this formula is the assertion that the partial sums

\displaystyle \sum_{n=1}^N \frac{1}{n^2} = 1 + \frac{1}{4} + \frac{1}{9} + \ldots + \frac{1}{N^2}

converge to {\pi^2/6} as {N \rightarrow \infty}, or in other words that

\displaystyle  \sum_{n=1}^N \frac{1}{n^2} = \frac{\pi^2}{6} + o(1)

where {o(1)} denotes a quantity that goes to zero as {N \rightarrow \infty}. Actually, by using the integral test estimate

\displaystyle  \sum_{n=N+1}^\infty \frac{1}{n^2} \leq \int_N^\infty \frac{dx}{x^2} = \frac{1}{N}

we have the sharper result

\displaystyle  \sum_{n=1}^N \frac{1}{n^2} = \frac{\pi^2}{6} + O(\frac{1}{N}).

Thus we can view {\frac{\pi^2}{6}} as the leading coefficient of the asymptotic expansion of the partial sums of {\sum_{n=1}^\infty 1/n^2}.

One can then try to inspect the partial sums of the expressions in (4), (5), (6), but the coefficients bear no obvious relationship to the right-hand sides:

\displaystyle  \sum_{n=1}^N 1 = N

\displaystyle  \sum_{n=1}^N n = \frac{1}{2} N^2 + \frac{1}{2} N

\displaystyle  \sum_{n=1}^N n^2 = \frac{1}{3} N^3 + \frac{1}{2} N^2 + \frac{1}{6} N.

For (7), the classical Faulhaber formula (or Bernoulli formula) gives

\displaystyle  \sum_{n=1}^N n^s = \frac{1}{s+1} \sum_{j=0}^s \binom{s+1}{j} B_j N^{s+1-j} \ \ \ \ \ (10)

\displaystyle  = \frac{1}{s+1} N^{s+1} + \frac{1}{2} N^s + \frac{s}{12} N^{s-1} + \ldots + B_s N

for {s \geq 2}, which has a vague resemblance to (7), but again the connection is not particularly clear.

The problem here is the discrete nature of the partial sum

\displaystyle  \sum_{n=1}^N n^s = \sum_{n \leq N} n^s,

which (if {N} is viewed as a real number) has jump discontinuities at each positive integer value of {N}. These discontinuities yield various artefacts when trying to approximate this sum by a polynomial in {N}. (These artefacts also occur in (2), but happen in that case to be obscured in the error term {O(1/N)}; but for the divergent sums (4), (5), (6), (7), they are large enough to cause real trouble.)

However, these issues can be resolved by replacing the abruptly truncated partial sums {\sum_{n=1}^N n^s} with smoothed sums {\sum_{n=1}^\infty \eta(n/N) n^s}, where {\eta: {\bf R}^+ \rightarrow {\bf R}} is a cutoff function, or more precisely a compactly supported bounded function that equals {1} at {0}. The case when {\eta} is the indicator function {1_{[0,1]}} then corresponds to the traditional partial sums, with all the attendant discretisation artefacts; but if one chooses a smoother cutoff, then these artefacts begin to disappear (or at least become lower order), and the true asymptotic expansion becomes more manifest.

Note that smoothing does not affect the asymptotic value of sums that were already absolutely convergent, thanks to the dominated convergence theorem. For instance, we have

\displaystyle  \sum_{n=1}^\infty \eta(n/N) \frac{1}{n^2} = \frac{\pi^2}{6} + o(1)

whenever {\eta} is a cutoff function (since {\eta(n/N) \rightarrow 1} pointwise as {N \rightarrow \infty} and is uniformly bounded). If {\eta} is equal to {1} on a neighbourhood of the origin, then the integral test argument then recovers the {O(1/N)} decay rate:

\displaystyle  \sum_{n=1}^\infty \eta(n/N) \frac{1}{n^2} = \frac{\pi^2}{6} + O(\frac{1}{N}).

However, smoothing can greatly improve the convergence properties of a divergent sum. The simplest example is Grandi’s series

\displaystyle  \sum_{n=1}^\infty (-1)^{n-1} = 1 - 1 + 1 - \ldots.

The partial sums

\displaystyle  \sum_{n=1}^N (-1)^{n-1} = \frac{1}{2} + \frac{1}{2} (-1)^{N-1}

oscillate between {1} and {0}, and so this series is not conditionally convergent (and certainly not absolutely convergent). However, if one performs analytic continuation on the series

\displaystyle  \sum_{n=1}^\infty \frac{(-1)^{n-1}}{n^s} = 1 - \frac{1}{2^s} + \frac{1}{3^s} - \ldots

and sets {s = 0}, one obtains a formal value of {1/2} for this series. This value can also be obtained by smooth summation. Indeed, for any cutoff function {\eta}, we can regroup

\displaystyle  \sum_{n=1}^\infty (-1)^{n-1} \eta(n/N) =

\displaystyle \frac{\eta(1/N)}{2} + \sum_{m=1}^\infty \frac{\eta((2m-1)/N) - 2\eta(2m/N) + \eta((2m+1)/N)}{2}.

If {\eta} is twice continuously differentiable (i.e. {\eta \in C^2}), then from Taylor expansion we see that the summand has size {O(1/N^2)}, and also (from the compact support of {\eta}) is only non-zero when {m=O(N)}. This leads to the asymptotic

\displaystyle  \sum_{n=1}^\infty (-1)^{n-1} \eta(n/N) = \frac{1}{2} + O( \frac{1}{N} )

and so we recover the value of {1/2} as the leading term of the asymptotic expansion.

Exercise 1 Show that if {\eta} is merely once continuously differentiable (i.e. {\eta \in C^1}), then we have a similar asymptotic, but with an error term of {o(1)} instead of {O(1/N)}. This is an instance of a more general principle that smoother cutoffs lead to better error terms, though the improvement sometimes stops after some degree of regularity.

Remark 1 The most famous instance of smoothed summation is Cesáro summation, which corresponds to the cutoff function {\eta(x) := (1-x)_+}. Unsurprisingly, when Cesáro summation is applied to Grandi’s series, one again recovers the value of {1/2}.

If we now revisit the divergent series (4), (5), (6), (7) with smooth summation in mind, we finally begin to see the origin of the right-hand sides. Indeed, for any fixed smooth cutoff function {\eta}, we will shortly show that

\displaystyle  \sum_{n=1}^\infty \eta(n/N) = -\frac{1}{2} + C_{\eta,0} N + O(\frac{1}{N}) \ \ \ \ \ (11)

\displaystyle  \sum_{n=1}^\infty n \eta(n/N) = -\frac{1}{12} + C_{\eta,1} N^2 + O(\frac{1}{N}) \ \ \ \ \ (12)

\displaystyle  \sum_{n=1}^\infty n^2 \eta(n/N) = C_{\eta,2} N^3 + O(\frac{1}{N}) \ \ \ \ \ (13)

and more generally

\displaystyle  \sum_{n=1}^\infty n^s \eta(n/N) = -\frac{B_{s+1}}{s+1} + C_{\eta,s} N^{s+1} + O(\frac{1}{N}) \ \ \ \ \ (14)

for any fixed {s=1,2,3,\ldots} where {C_{\eta,s}} is the Archimedean factor

\displaystyle  C_{\eta,s} := \int_0^\infty x^s \eta(x)\ dx \ \ \ \ \ (15)

(which is also essentially the Mellin transform of {\eta}). Thus we see that the values (4), (5), (6), (7) obtained by analytic continuation are nothing more than the constant terms of the asymptotic expansion of the smoothed partial sums. This is not a coincidence; we will explain the equivalence of these two interpretations of such sums (in the model case when the analytic continuation has only finitely many poles and does not grow too fast at infinity) below the fold.

This interpretation clears up the apparent inconsistencies alluded to earlier. For instance, the sum {\sum_{n=1}^\infty n = 1 + 2 + 3 + \ldots} consists only of non-negative terms, as does its smoothed partial sums {\sum_{n=1}^\infty n \eta(n/N)} (if {\eta} is non-negative). Comparing this with (13), we see that this forces the highest-order term {C_{\eta,1} N^2} to be non-negative (as indeed it is), but does not prohibit the lower-order constant term {-\frac{1}{12}} from being negative (which of course it is).

Similarly, if we add together (12) and (11) we obtain

\displaystyle  \sum_{n=1}^\infty (n+1) \eta(n/N) = -\frac{7}{12} + C_{\eta,1} N^2 + C_{\eta,0} N + O(\frac{1}{N}) \ \ \ \ \ (16)

while if we subtract {1} from (12) we obtain

\displaystyle  \sum_{n=2}^\infty n \eta(n/N) = -\frac{13}{12} + C_{\eta,1} N^2 + O(\frac{1}{N}). \ \ \ \ \ (17)

These two asymptotics are not inconsistent with each other; indeed, if we shift the index of summation in (17), we can write

\displaystyle  \sum_{n=2}^\infty n \eta(n/N) = \sum_{n=1}^\infty (n+1) \eta((n+1)/N) \ \ \ \ \ (18)

and so we now see that the discrepancy between the two sums in (8), (9) come from the shifting of the cutoff {\eta(n/N)}, which is invisible in the formal expressions in (8), (9) but become manifestly present in the smoothed sum formulation.

Exercise 2 By Taylor expanding {\eta(n+1/N)} and using (11), (18) show that (16) and (17) are indeed consistent with each other, and in particular one can deduce the latter from the former.

Read the rest of this entry »

Jean-Pierre Serre (whose papers are, of course, always worth reading) recently posted a lovely lecture on the arXiv entitled “How to use finite fields for problems concerning infinite fields”. In it, he describes several ways in which algebraic statements over fields of zero characteristic, such as {{\mathbb C}}, can be deduced from their positive characteristic counterparts such as {F_{p^m}}, despite the fact that there is no non-trivial field homomorphism between the two types of fields. In particular finitary tools, including such basic concepts as cardinality, can now be deployed to establish infinitary results. This leads to some simple and elegant proofs of non-trivial algebraic results which are not easy to establish by other means.

One deduction of this type is based on the idea that positive characteristic fields can partially model zero characteristic fields, and proceeds like this: if a certain algebraic statement failed over (say) {{\mathbb C}}, then there should be a “finitary algebraic” obstruction that “witnesses” this failure over {{\mathbb C}}. Because this obstruction is both finitary and algebraic, it must also be definable in some (large) finite characteristic, thus leading to a comparable failure over a finite characteristic field. Taking contrapositives, one obtains the claim.

Algebra is definitely not my own field of expertise, but it is interesting to note that similar themes have also come up in my own area of additive combinatorics (and more generally arithmetic combinatorics), because the combinatorics of addition and multiplication on finite sets is definitely of a “finitary algebraic” nature. For instance, a recent paper of Vu, Wood, and Wood establishes a finitary “Freiman-type” homomorphism from (finite subsets of) the complex numbers to large finite fields that allows them to pull back many results in arithmetic combinatorics in finite fields (e.g. the sum-product theorem) to the complex plane. (Van Vu and I also used a similar trick to control the singularity property of random sign matrices by first mapping them into finite fields in which cardinality arguments became available.) And I have a particular fondness for correspondences between finitary and infinitary mathematics; the correspondence Serre discusses is slightly different from the one I discuss for instance in here or here, although there seems to be a common theme of “compactness” (or of model theory) tying these correspondences together.

As one of his examples, Serre cites one of my own favourite results in algebra, discovered independently by Ax and by Grothendieck (and then rediscovered many times since). Here is a special case of that theorem:

Theorem 1 (Ax-Grothendieck theorem, special case) Let {P: {\mathbb C}^n \rightarrow {\mathbb C}^n} be a polynomial map from a complex vector space to itself. If {P} is injective, then {P} is bijective.

The full version of the theorem allows one to replace {{\mathbb C}^n} by an algebraic variety {X} over any algebraically closed field, and for {P} to be an morphism from the algebraic variety {X} to itself, but for simplicity I will just discuss the above special case. This theorem is not at all obvious; it is not too difficult (see Lemma 4 below) to show that the Jacobian of {P} is non-degenerate, but this does not come close to solving the problem since one would then be faced with the notorious Jacobian conjecture. Also, the claim fails if “polynomial” is replaced by “holomorphic”, due to the existence of Fatou-Bieberbach domains.

In this post I would like to give the proof of Theorem 1 based on finite fields as mentioned by Serre, as well as another elegant proof of Rudin that combines algebra with some elementary complex variable methods. (There are several other proofs of this theorem and its generalisations, for instance a topological proof by Borel, which I will not discuss here.)

Update, March 8: Some corrections to the finite field proof. Thanks to Matthias Aschenbrenner also for clarifying the relationship with Tarski’s theorem and some further references.

Read the rest of this entry »

[This post was typeset using a LaTeX to WordPress-HTML converter kindly provided to me by Luca Trevisan.]

Many properties of a (sufficiently nice) function {f: {\mathbb R} \rightarrow {\mathbb C}} are reflected in its Fourier transform {\hat f: {\mathbb R} \rightarrow {\mathbb C}}, defined by the formula

\displaystyle  \hat f(\xi) := \int_{-\infty}^\infty f(x) e^{-2\pi i x \xi}\ dx. \ \ \ \ \ (1)

For instance, decay properties of {f} are reflected in smoothness properties of {\hat f}, as the following table shows:

If {f} is… then {\hat f} is… and this relates to…
Square-integrable square-integrable Plancherel’s theorem
Absolutely integrable continuous Riemann-Lebesgue lemma
Rapidly decreasing smooth theory of Schwartz functions
Exponentially decreasing analytic in a strip
Compactly supported entire and at most exponential growth Paley-Wiener theorem

Another important relationship between a function {f} and its Fourier transform {\hat f} is the uncertainty principle, which roughly asserts that if a function {f} is highly localised in space, then its Fourier transform {\hat f} must be widely dispersed in space, or to put it another way, {f} and {\hat f} cannot both decay too strongly at infinity (except of course in the degenerate case {f=0}). There are many ways to make this intuition precise. One of them is the Heisenberg uncertainty principle, which asserts that if we normalise

\displaystyle  \int_{{\mathbb R}} |f(x)|^2\ dx = \int_{\mathbb R} |\hat f(\xi)|^2\ d\xi = 1

then we must have

\displaystyle  (\int_{\mathbb R} |x|^2 |f(x)|^2\ dx) \cdot (\int_{\mathbb R} |\xi|^2 |\hat f(\xi)|^2\ dx)\geq \frac{1}{(4\pi)^2}

thus forcing at least one of {f} or {\hat f} to not be too concentrated near the origin. This principle can be proven (for sufficiently nice {f}, initially) by observing the integration by parts identity

\displaystyle  \langle xf, f' \rangle = \int_{\mathbb R} x f(x) \overline{f'(x)}\ dx = - \frac{1}{2} \int_{\mathbb R} |f(x)|^2\ dx

and then using Cauchy-Schwarz and the Plancherel identity.

Another well known manifestation of the uncertainty principle is the fact that it is not possible for {f} and {\hat f} to both be compactly supported (unless of course they vanish entirely). This can be in fact be seen from the above table: if {f} is compactly supported, then {\hat f} is an entire function; but the zeroes of a non-zero entire function are isolated, yielding a contradiction unless {f} vanishes. (Indeed, the table also shows that if one of {f} and {\hat f} is compactly supported, then the other cannot have exponential decay.)

On the other hand, we have the example of the Gaussian functions {f(x) = e^{-\pi a x^2}}, {\hat f(\xi) = \frac{1}{\sqrt{a}} e^{-\pi \xi^2/a }}, which both decay faster than exponentially. The classical Hardy uncertainty principle asserts, roughly speaking, that this is the fastest that {f} and {\hat f} can simultaneously decay:

Theorem 1 (Hardy uncertainty principle) Suppose that {f} is a (measurable) function such that {|f(x)| \leq C e^{-\pi a x^2 }} and {|\hat f(\xi)| \leq C' e^{-\pi \xi^2/a}} for all {x, \xi} and some {C, C', a > 0}. Then {f(x)} is a scalar multiple of the gaussian {e^{-\pi ax^2}}.

This theorem is proven by complex-analytic methods, in particular the Phragmén-Lindelöf principle; for sake of completeness we give that proof below. But I was curious to see if there was a real-variable proof of the same theorem, avoiding the use of complex analysis. I was able to find the proof of a slightly weaker theorem:

Theorem 2 (Weak Hardy uncertainty principle) Suppose that {f} is a non-zero (measurable) function such that {|f(x)| \leq C e^{-\pi a x^2 }} and {|\hat f(\xi)| \leq C' e^{-\pi b \xi^2}} for all {x, \xi} and some {C, C', a, b > 0}. Then {ab \leq C_0} for some absolute constant {C_0}.

Note that the correct value of {C_0} should be {1}, as is implied by the true Hardy uncertainty principle. Despite the weaker statement, I thought the proof might still might be of interest as it is a little less “magical” than the complex-variable one, and so I am giving it below.

Read the rest of this entry »

In the second of the Distinguished Lecture Series given by Eli Stein here at UCLA, Eli expanded on the themes in the first lecture, in particular providing more details as to the recent (not yet published) results of Lanzani and Stein on the boundedness of the Cauchy integral on domains in several complex variables.

Read the rest of this entry »

The first Distinguished Lecture Series at UCLA for this academic year is given by Elias Stein (who, incidentally, was my graduate student advisor), who is lecturing on “Singular Integrals and Several Complex Variables: Some New Perspectives“.  The first lecture was a historical (and non-technical) survey of modern harmonic analysis (which, amazingly, was compressed into half an hour), followed by an introduction as to how this theory is currently in the process of being adapted to handle the basic analytical issues in several complex variables, a topic which in many ways is still only now being developed.  The second and third lectures will focus on these issues in greater depth.

As usual, any errors here are due to my transcription and interpretation of the lecture.

[Update, Oct 27: The slides from the talk are now available here.]

Read the rest of this entry »

I am very saddened (and stunned) to learn that Oded Schramm, who made fundamental contributions to conformal geometry, probability theory, and mathematical physics, died in a hiking accident this Monday, aged 46.  (I knew him as a fellow editor of the Journal of the American Mathematical Society, as well as for his mathematical research, of course.)  It is a loss of both a great mathematician and a great person.

One of Schramm’s most fundamental contributions to mathematics is the introduction of the stochastic Loewner equation (now sometimes called the Schramm-Loewner equation in his honour), together with his subsequent development of the theory of this equation with Greg Lawler and Wendelin Werner.  (This work has been recognised by a number of awards, including the Fields Medal in 2006 to Wendelin.)  This equation (which I state after the jump) describes, for each choice of a parameter \kappa > 0, a random (fractal) curve SLE(\kappa) in the plane; this random curve can be viewed as a nonlinear variant of Brownian motion, although the SLE curves tend to cross themselves much less frequently than Brownian paths do.  By the nature of their construction, the SLE(\kappa) curves are conformally invariant: any conformal transformation of an SLE(\kappa) curve (respecting the boundary) gives another curve which has the same distribution as the original curve.  (Brownian motion is also conformally invariant; given the close connections between Brownian motion and harmonic functions, it is not surprising that this fact is closely related to the fact that the property of a function being harmonic in the plane is preserved under conformal transformations.) Conversely, one can show that any conformally invariant random curve distribution which obeys some additional regularity and locality axioms must be of the form SLE(\kappa) for some \kappa.

The amazing fact is that many other natural processes for generating random curves in the plane – e.g. loop-erased random walk, the boundary of Brownian motion (also known as the “Brownian frontier”), or the limit of percolation on the triangular lattice – are known or conjectured to be distributed according to SLE(\kappa) for some specific \kappa (in the above three examples, \kappa is 2, 8/3, and 6 respectively).  In particular, this implies that the highly non-trivial fact that such distributions are conformally invariant, a phenomenon that had been conjectured by physicists but which only obtained rigorous mathematical proof following the work of Schramm and his coauthors.

[Update, Sep 6: A memorial blog to Oded has been set up by his Microsoft Research group here.  See also these posts by Gil Kalai, Yuval Peres, and Luca Trevisan.]

Read the rest of this entry »

The final Distinguished Lecture Series for this academic year at UCLA was started on Tuesday by Shing-Tung Yau. (We’ve had a remarkably high-quality array of visitors this year; for instance, in addition to those already mentioned in this blog, mathematicians such as Peter Lax and Michael Freedman have come here and given lectures earlier this year.) Yau’s chosen topic is “Geometric Structures on Manifolds”, and the first talk was an introduction and overview of his later two, titled “What is a Geometric Structure.” Once again, I found this a great opportunity to learn about a field adjacent to my own areas of expertise, in this case geometric analysis (which is adjacent to nonlinear PDE).

As usual, all inaccuracies in these notes are due to myself and not to Yau, and I welcome corrections or comments. Yau’s slides for the talk are available here. Read the rest of this entry »


RSS Google+ feed

  • An error has occurred; the feed is probably down. Try again later.

Get every new post delivered to your Inbox.

Join 3,310 other followers