You are currently browsing the monthly archive for October 2012.

If is a connected topological manifold, and is a point in , the (topological) fundamental group of at is traditionally defined as the space of equivalence classes of loops starting and ending at , with two loops considered equivalent if they are homotopic to each other. (One can of course define the fundamental group for more general classes of topological spaces, such as locally path connected spaces, but we will stick with topological manifolds in order to avoid pathologies.) As the name suggests, it is one of the most basic topological invariants of a manifold, which among other things can be used to classify the covering spaces of that manifold. Indeed, given any such covering , the fundamental group acts (on the right) by monodromy on the fibre , and conversely given any discrete set with a right action of , one can find a covering space with that monodromy action (this can be done by “tensoring” the universal cover with the given action, as illustrated below the fold). In more category-theoretic terms: monodromy produces an equivalence of categories between the category of covers of , and the category of discrete -sets.

One of the basic tools used to compute fundamental groups is van Kampen’s theorem:

Theorem 1 (van Kampen’s theorem)Let be connected open sets covering a connected topological manifold with also connected, and let be an element of . Then is isomorphic to the amalgamated free product .

Since the topological fundamental group is customarily defined using loops, it is not surprising that many proofs of van Kampen’s theorem (e.g. the one in Hatcher’s text) proceed by an analysis of the loops in , carefully deforming them into combinations of loops in or in and using the combinatorial description of the amalgamated free product (which was discussed in this previous blog post). But I recently learned (thanks to the responses to this recent MathOverflow question of mine) that by using the above-mentioned equivalence of categories, one can convert statements about fundamental groups to statements about coverings. In particular, van Kampen’s theorem turns out to be equivalent to a basic statement about how to glue a cover of and a cover of together to give a cover of , and the amalgamated free product emerges through its categorical definition as a coproduct, rather than through its combinatorial description. One advantage of this alternate proof is that it can be extended to other contexts (such as the étale fundamental groups of varieties or schemes) in which the concept of a path or loop is no longer useful, but for which the notion of a covering is still important. I am thus recording (mostly for my own benefit) the covering-based proof of van Kampen’s theorem in the topological setting below the fold.

Garth Gaudry, who made many contributions to harmonic analysis and to Australian mathematics, and was also both my undergradaute and masters advisor as well as the head of school during one of my first academic jobs, died yesterday after a long battle with cancer, aged 71.

Garth worked on the interface between real-variable harmonic analysis and abstract harmonic analysis (which, despite their names, are actually two distinct fields, though certainly related to each other). He was one of the first to realise the central importance of Littlewood-Paley theory as a general foundation for both abstract and real-variable harmonic analysis, writing an influential text with Robert Edwards on the topic. He also made contributions to Clifford analysis, which was also the topic of my masters thesis.

But, amongst Australian mathematicians at least, Garth will be remembered for his tireless service to the field, most notably for his pivotal role in founding the Australian Mathematical Sciences Institute (AMSI) and then serving as AMSI’s first director, and then in directing the International Centre of Excellence for Education in Mathematics (ICE-EM), the educational arm of AMSI which, among other things, developed a full suite of maths textbooks and related educational materials covering Years 5-10 (which I reviewed here back in 2008).

I knew Garth ever since I was an undergraduate at Flinders University. He was head of school then (a position roughly equivalent to department chair in the US), but still was able to spare an hour a week to meet with me to discuss real analysis, as I worked my way through Rudin’s “Real and complex analysis” and then Stein’s “Singular integrals”, and then eventually completed a masters thesis under his supervision on Clifford-valued singular integrals. When Princeton accepted my application for graduate study, he convinced me to take the opportunity without hesitation. Without Garth, I certainly wouldn’t be where I am at today, and I will always be very grateful for his advisorship. He was a good person, and he will be missed very much by me and by many others.

One of the basic general problems in analytic number theory is to understand as much as possible the fluctuations of the Möbius function , defined as when is the product of distinct primes, and zero otherwise. For instance, as takes values in , we have the trivial bound

and the seemingly slight improvement

is already equivalent to the prime number theorem, as observed by Landau (see e.g. this previous blog post for a proof), while the much stronger (and still open) improvement

is equivalent to the notorious Riemann hypothesis.

There is a general *Möbius pseudorandomness heuristic* that suggests that the sign pattern behaves so randomly (or pseudorandomly) that one should expect a substantial amount of cancellation in sums that involve the sign fluctuation of the Möbius function in a nontrivial fashion, with the amount of cancellation present comparable to the amount that an analogous random sum would provide; cf. the probabilistic heuristic discussed in this recent blog post. There are a number of ways to make this heuristic precise. As already mentioned, the Riemann hypothesis can be considered one such manifestation of the heuristic. Another manifestation is the following old conjecture of Chowla:

Conjecture 1 (Chowla’s conjecture)For any fixed integer and exponents , with at least one of the odd (so as not to completely destroy the sign cancellation), we have

Note that as for any , we can reduce to the case when the take values in here. When only one of the are odd, this is essentially the prime number theorem in arithmetic progressions (after some elementary sieving), but with two or more of the are odd, the problem becomes completely open. For instance, the estimate

is morally very close to the conjectured asymptotic

for the von Mangoldt function , where is the twin prime constant; this asymptotic in turn implies the twin prime conjecture. (To formally deduce estimates for von Mangoldt from estimates for Möbius, though, typically requires some better control on the error terms than , in particular gains of some power of are usually needed. See this previous blog post for more discussion.)

Remark 2The Chowla conjecture resembles an assertion that, for chosen randomly and uniformly from to , the random variables become asymptotically independent of each other (in the probabilistic sense) as . However, this is not quite accurate, because some moments (namely those with all exponents even) have the “wrong” asymptotic value, leading to some unwanted correlation between the two variables. For instance, the events and have a strong correlation with each other, basically because they are both strongly correlated with the event of being divisible by . A more accurate interpretation of the Chowla conjecture is that the random variables are asymptoticallyconditionallyindependent of each other, after conditioning on the zero pattern ; thus, it is thesignof the Möbius function that fluctuates like random noise, rather than the zero pattern. (The situation is a bit cleaner if one works instead with the Liouville function instead of the Möbius function , as this function never vanishes, but we will stick to the traditional Möbius function formalism here.)

A more recent formulation of the Möbius randomness heuristic is the following conjecture of Sarnak. Given a bounded sequence , define the *topological entropy* of the sequence to be the least exponent with the property that for any fixed , and for going to infinity the set of can be covered by balls of radius (in the metric). (If arises from a minimal topological dynamical system by , the above notion is equivalent to the usual notion of the topological entropy of a dynamical system.) For instance, if the sequence is a bit sequence (i.e. it takes values in ), then there are only -bit patterns that can appear as blocks of consecutive bits in this sequence. As a special case, a Turing machine with bounded memory that had access to a random number generator at the rate of one random bit produced every units of time, but otherwise evolved deterministically, would have an output sequence that had a topological entropy of at most . A bounded sequence is said to be *deterministic* if its topological entropy is zero. A typical example is a polynomial sequence such as for some fixed ; the -blocks of such polynomials sequence have covering numbers that only grow polynomially in , rather than exponentially, thus yielding the zero entropy. Unipotent flows, such as the horocycle flow on a compact hyperbolic surface, are another good source of deterministic sequences.

Conjecture 3 (Sarnak’s conjecture)Let be a deterministic bounded sequence. Then

This conjecture in general is still quite far from being solved. However, special cases are known:

- For constant sequences, this is essentially the prime number theorem (1).
- For periodic sequences, this is essentially the prime number theorem in arithmetic progressions.
- For quasiperiodic sequences such as for some continuous , this follows from the work of Davenport.
- For nilsequences, this is a result of Ben Green and myself.
- For horocycle flows, this is a result of Bourgain, Sarnak, and Ziegler.
- For the Thue-Morse sequence, this is a result of Dartyge-Tenenbaum (with a stronger error term obtained by Maduit-Rivat). A subsequent result of Bourgain handles all bounded rank one sequences (though the Thue-Morse sequence is actually of rank two), and a related result of Green establishes asymptotic orthogonality of the Möbius function to bounded depth circuits, although such functions are not necessarily deterministic in nature.
- For the Rudin-Shapiro sequence, I sketched out an argument at this MathOverflow post.
- The Möbius function is known to itself be non-deterministic, because its square (i.e. the indicator of the square-free functions) is known to be non-deterministic (indeed, its topological entropy is ). (The corresponding question for the Liouville function , however, remains open, as the square has zero entropy.)
- In the converse direction, it is easy to construct sequences of arbitrarily small positive entropy that correlate with the Möbius function (a rather silly example is for some fixed large (squarefree) , which has topological entropy at most but clearly correlates with ).

See this survey of Sarnak for further discussion of this and related topics.

In this post I wanted to give a very nice argument of Sarnak that links the above two conjectures:

Proposition 4The Chowla conjecture implies the Sarnak conjecture.

The argument does not use any number-theoretic properties of the Möbius function; one could replace in both conjectures by any other function from the natural numbers to and obtain the same implication. The argument consists of the following ingredients:

- To show that , it suffices to show that the expectation of the random variable , where is drawn uniformly at random from to , can be made arbitrary small by making large (and even larger).
- By the union bound and the zero topological entropy of , it suffices to show that for any bounded
*deterministic*coefficients , the random variable concentrates with exponentially high probability. - Finally, this exponentially high concentration can be achieved by the moment method, using a slight variant of the moment method proof of the large deviation estimates such as the Chernoff inequality or Hoeffding inequality (as discussed in this blog post).

As is often the case, though, while the “top-down” order of steps presented above is perhaps the clearest way to think *conceptually* about the argument, in order to *present* the argument formally it is more convenient to present the arguments in the reverse (or “bottom-up”) order. This is the approach taken below the fold.

One of the basic problems in the field of operator algebras is to develop a functional calculus for either a single operator , or a collection of operators. These operators could in principle act on any function space, but typically one either considers complex matrices (which act on a complex finite dimensional space), or operators (either bounded or unbounded) on a complex Hilbert space. (One can of course also obtain analogous results for real operators, but we will work throughout with complex operators in this post.)

Roughly speaking, a functional calculus is a way to assign an operator or to any function in a suitable function space, which is linear over the complex numbers, preserve the scalars (i.e. when ), and should be either an exact or approximate homomorphism in the sense that

should hold either exactly or approximately. In the case when the are self-adjoint operators acting on a Hilbert space (or Hermitian matrices), one often also desires the identity

to also hold either exactly or approximately. (Note that one cannot reasonably expect (1) and (2) to hold exactly for all if the and their adjoints do not commute with each other, so in those cases one has to be willing to allow some error terms in the above wish list of properties of the calculus.) Ideally, one should also be able to relate the operator norm of or with something like the uniform norm on . In principle, the existence of a good functional calculus allows one to manipulate operators as if they were scalars (or at least approximately as if they were scalars), which is very helpful for a number of applications, such as partial differential equations, spectral theory, noncommutative probability, and semiclassical mechanics. A functional calculus for multiple operators can be particularly valuable as it allows one to treat as being exact or approximate scalars *simultaneously*. For instance, if one is trying to solve a linear differential equation that can (formally at least) be expressed in the form

for some data , unknown function , some differential operators , and some nice function , then if one’s functional calculus is good enough (and is suitably “elliptic” in the sense that it does not vanish or otherwise degenerate too often), one should be able to solve this equation either exactly or approximately by the formula

which is of course how one would solve this equation if one pretended that the operators were in fact scalars. Formalising this calculus rigorously leads to the theory of pseudodifferential operators, which allows one to (approximately) solve or at least simplify a much wider range of differential equations than one what can achieve with more elementary algebraic transformations (e.g. integrating factors, change of variables, variation of parameters, etc.). In quantum mechanics, a functional calculus that allows one to treat operators as if they were approximately scalar can be used to rigorously justify the correspondence principle in physics, namely that the predictions of quantum mechanics approximate that of classical mechanics in the *semiclassical limit* .

There is no universal functional calculus that works in all situations; the strongest functional calculi, which are close to being an exact *-homomorphisms on very large class of functions, tend to only work for under very restrictive hypotheses on or (in particular, when , one needs the to commute either exactly, or very close to exactly), while there are weaker functional calculi which have fewer nice properties and only work for a very small class of functions, but can be applied to quite general operators or . In some cases the functional calculus is only formal, in the sense that or has to be interpreted as an infinite formal series that does not converge in a traditional sense. Also, when one wishes to select a functional calculus on non-commuting operators , there is a certain amount of non-uniqueness: one generally has a number of slightly different functional calculi to choose from, which generally have the same properties but differ in some minor technical details (particularly with regards to the behaviour of “lower order” components of the calculus). This is a similar to how one has a variety of slightly different coordinate systems available to parameterise a Riemannian manifold or Lie group. This is on contrast to the case when the underlying operator is (essentially) normal (so that commutes with ); in this special case (which includes the important subcases when is unitary or (essentially) self-adjoint), spectral theory gives us a canonical and very powerful functional calculus which can be used without further modification in applications.

Despite this lack of uniqueness, there is one standard choice for a functional calculus available for general operators , namely the Weyl functional calculus; it is analogous in some ways to normal coordinates for Riemannian manifolds, or *exponential coordinates of the first kind* for Lie groups, in that it treats lower order terms in a reasonably nice fashion. (But it is important to keep in mind that, like its analogues in Riemannian geometry or Lie theory, there will be some instances in which the Weyl calculus is not the optimal calculus to use for the application at hand.)

I decided to write some notes on the Weyl functional calculus (also known as Weyl quantisation), and to sketch the applications of this calculus both to the theory of pseudodifferential operators. They are mostly for my own benefit (so that I won’t have to redo these particular calculations again), but perhaps they will also be of interest to some readers here. (Of course, this material is also covered in many other places. e.g. Folland’s “harmonic analysis in phase space“.)

Way back in 2007, I wrote a blog post giving Einstein’s derivation of his famous equation for the rest energy of a body with mass . (Throughout this post, mass is used to refer to the invariant mass (also known as *rest mass*) of an object.) This derivation used a number of physical assumptions, including the following:

- The two postulates of special relativity: firstly, that the laws of physics are the same in every inertial reference frame, and secondly that the speed of light in vacuum is equal in every such inertial frame.
- Planck’s relation and de Broglie’s law for photons, relating the frequency, energy, and momentum of such photons together.
- The law of conservation of energy, and the law of conservation of momentum, as well as the additivity of these quantities (i.e. the energy of a system is the sum of the energy of its components, and similarly for momentum).
- The Newtonian approximations , to energy and momentum at low velocities.

The argument was one-dimensional in nature, in the sense that only one of the three spatial dimensions was actually used in the proof.

As was pointed out in comments in the previous post by Laurens Gunnarsen, this derivation has the curious feature of needing some laws from quantum mechanics (specifically, the Planck and de Broglie laws) in order to derive an equation in special relativity (which does not ostensibly require quantum mechanics). One can then ask whether one can give a derivation that does not require such laws. As pointed out in previous comments, one can use the representation theory of the Lorentz group to give a nice derivation that avoids any quantum mechanics, but it now needs at least two spatial dimensions instead of just one. I decided to work out this derivation in a way that does not explicitly use representation theory (although it is certainly lurking beneath the surface). The concept of momentum is only barely used in this derivation, and the main ingredients are now reduced to the following:

- The two postulates of special relativity;
- The law of conservation of energy (and the additivity of energy);
- The Newtonian approximation at low velocities.

The argument (which uses a little bit of calculus, but is otherwise elementary) is given below the fold. Whereas Einstein’s original argument considers a mass emitting two photons in several different reference frames, the argument here considers a large mass breaking up into two equal smaller masses. Viewing this situation in different reference frames gives a functional equation for the relationship between energy, mass, and velocity, which can then be solved using some calculus, using the Newtonian approximation as a boundary condition, to give the famous formula.

*Disclaimer:* As with the previous post, the arguments here are physical arguments rather than purely mathematical ones, and thus do not really qualify as a rigorous mathematical argument, due to the implicit use of a number of physical and metaphysical hypotheses beyond the ones explicitly listed above. (But it would be difficult to say anything non-tautological at all about the physical world if one could rely solely on rigorous mathematical reasoning.)

## Recent Comments