You are currently browsing the category archive for the 'expository' category.

The Riemann zeta function \zeta(s), defined for \hbox{Re}(s) > 1 by the formula

\displaystyle \zeta(s) := \sum_{n \in {\Bbb N}} \frac{1}{n^s} (1)

where {\Bbb N} = \{1,2,\ldots\} are the natural numbers, and extended meromorphically to other values of s by analytic continuation, obeys the remarkable functional equation

 \displaystyle \Xi(s) = \Xi(1-s) (2)

where

 \displaystyle \Xi(s) := \Gamma_\infty(s) \zeta(s) (3)

is the Riemann Xi function,

 \displaystyle \Gamma_\infty(s) := \pi^{-s/2} \Gamma(s/2) (4)

is the Gamma factor at infinity, and the Gamma function \Gamma(s) is defined for \hbox{Re}(s) > 1 by

 \displaystyle \Gamma(s) := \int_0^\infty e^{-t} t^s\ \frac{dt}{t} (5)

and extended meromorphically to other values of s by analytic continuation.

There are many proofs known of the functional equation (2).  One of them (dating back to Riemann himself) relies on the Poisson summation formula

 \displaystyle \sum_{a \in {\Bbb Z}} f_\infty(a t_\infty) = \frac{1}{|t|_\infty} \sum_{a \in {\Bbb Z}} \hat f_\infty(a/t_\infty) (6)

for the reals k_\infty := {\Bbb R} and t \in k_\infty^*, where f is a Schwartz function, |t|_\infty := |t| is the usual Archimedean absolute value on k_\infty, and

 \displaystyle \hat f_\infty(\xi_\infty) := \int_{k_\infty} e_\infty(-x_\infty \xi_\infty) f_\infty(x_\infty)\ dx_\infty (7)

is the Fourier transform on k_\infty, with e_\infty(x_\infty) := e^{2\pi i x_\infty} being the standard character e_\infty: k_\infty \to S^1 on k_\infty.  (The reason for this rather strange notation for the real line and its associated structures will be made clearer shortly.)  Applying this formula to the (Archimedean) Gaussian function

 \displaystyle g_\infty(x_\infty) := e^{-\pi |x_\infty|^2}, (8)

which is its own (additive) Fourier transform, and then applying the multiplicative Fourier transform (i.e. the Mellin transform), one soon obtains (2).  (Riemann also had another proof of the functional equation relying primarily on contour integration, which I will not discuss here.)  One can “clean up” this proof a bit by replacing the Gaussian by a Dirac delta function, although one now has to work formally and “renormalise” by throwing away some infinite terms.  (One can use the theory of distributions to make this latter approach rigorous, but I will not discuss this here.)  Note how this proof combines the additive Fourier transform with the multiplicative Fourier transform.  [Continuing with this theme, the Gamma function (5) is an inner product between an additive character e^{-t} and a multiplicative character t^s, and the zeta function (1) can be viewed both additively, as a sum over n, or multiplicatively, as an Euler product.]

In the famous thesis of Tate, the above argument was reinterpreted using the language of the adele ring {\Bbb A}, with the Poisson summation formula (4) on k_\infty replaced by the Poisson summation formula

\displaystyle \sum_{a \in k} f(a t) = \sum_{a \in k} \hat f(t/a) (9)

on {\Bbb A}, where k = {\Bbb Q} is the rationals, t \in {\Bbb A}, and f is now a Schwartz-Bruhat function on {\Bbb A}.  Applying this formula to the adelic (or global) Gaussian function g(x) := g_\infty(x_\infty) \prod_p 1_{{\mathbb Z}_p}(x_p), which is its own Fourier transform, and then using the adelic Mellin transform, one again obtains (2).  Again, the proof can be cleaned up by replacing the Gaussian with a Dirac mass, at the cost of making the computations formal (or requiring the theory of distributions).

In this post I will write down both Riemann’s proof and Tate’s proof together (but omitting some technical details), to emphasise the fact that they are, in some sense, the same proof.  However, Tate’s proof gives a high-level clarity to the situation (in particular, explaining more adequately why the Gamma factor at infinity (4) fits seamlessly with the Riemann zeta function (1) to form the Xi function (2)), and allows one to generalise the functional equation relatively painlessly to other zeta-functions and L-functions, such as Dedekind zeta functions and Hecke L-functions.

[Note: the material here is very standard in modern algebraic number theory; the post here is partially for my own benefit, as most treatments of this topic in the literature tend to operate in far higher levels of generality than I would prefer.]

Read the rest of this entry »

Last year, as part of my “open problem of the week” series (now long since on hiatus), I featured one of my favorite problems, namely that of establishing scarring for the Bunimovich stadium.  I’m now happy to say that this problem has been solved (for generic stadiums, at least, and for phase space scarring rather than physical space scarring) by my old friend (and fellow Aussie), Andrew Hassell, in a recent preprint.  Congrats Andrew!

Actually, the argument is beautifully simple and short (the paper is a mere 9 pages), though it of course uses the basic theory of eigenfunctions on domains, such as Weyl’s law, and I can give the gist of it here (suppressing all technical details).

Read the rest of this entry »

Some time ago, I wrote a short unpublished note (mostly for my own benefit) when I was trying to understand the derivation of the Black-Scholes equation in financial mathematics, which computes the price of various options under some assumptions on the underlying financial model.  In order to avoid issues relating to stochastic calculus, Itō’s formula, etc. I only considered a discrete model rather than a continuous one, which makes the mathematics much more elementary.  I was recently asked about this note, and decided that it would be worthwhile to expand it into a blog article here.  The emphasis here will be on the simplest models rather than the most realistic models, in order to emphasise the beautifully simple basic idea behind the derivation of this formula.

The basic type of problem that the Black-Scholes equation solves (in particular models) is the following.  One has an underlying financial instrument S, which represents some asset which can be bought and sold at various times t, with the per-unit price S_t of the instrument varying with t.  (For the mathematical model, it is not relevant what type of asset S actually is, but one could imagine for instance that S is a stock, a commodity, a currency, or a bond.)  Given such an underlying instrument S, one can create options based on S and on some future time t_1, which give the buyer and seller of the options certain rights and obligations regarding S at an expiration time t_1.  For instance,

  1. A call option for S at time t_1 and at a strike price P gives the buyer of the option the right (but not the obligation) to buy a unit of S from the seller of the option at price P at time t_1 (conversely, the seller of the option has the obligation but not the right to sell a unit of S to the buyer of the option at time t_1, if the buyer so requests).
  2. A put option for S at time t_1 and at a strike price P gives the buyer of the option the right (but not the obligation) to sell a unit of S to the seller of the option at price P at time t_1 (and conversely, the seller of the option has the obligation but not the right to buy a unit of S from the buyer of the option at time t_1, if the buyer so requests).
  3. More complicated options, such as straddles and collars, can be formed by taking linear combinations of call and put options, e.g. simultaneously buying or selling a call and a put option.  One can also consider “American options” which offer rights and obligations for an interval of time, rather than the “European options” described above which only apply at a fixed time t_1.  The Black-Scholes formula applies only to European options, though extensions of this theory have been applied to American options.

The problem is this: what is the “correct” price, at time t_0, to assign to an European option (such as a put or call option) at a future expiration time t_1?  Of course, due to the volatility of the underlying instrument S, the future price S_{t_1} of this instrument is not known at time t_0.  Nevertheless - and this is really quite a remarkable fact - it is still possible to compute deterministically, at time t_0, the price of an option that depends on that unknown price S_{t_1}, under certain assumptions (one of which is that one knows exactly how volatile the underlying instrument is).

Read the rest of this entry »

Let X be a real-valued random variable, and let X_1, X_2, X_3, ... be an infinite sequence of independent and identically distributed copies of X. Let \overline{X}_n := \frac{1}{n}(X_1 + \ldots + X_n) be the empirical averages of this sequence. A fundamental theorem in probability theory is the law of large numbers, which comes in both a weak and a strong form:

Weak law of large numbers. Suppose that the first moment {\Bbb E} |X| of X is finite. Then \overline{X}_n converges in probability to {\Bbb E} X, thus \lim_{n \to \infty} {\Bbb P}( |\overline{X}_n - {\Bbb E} X| \geq \varepsilon ) = 0 for every \varepsilon > 0.

Strong law of large numbers. Suppose that the first moment {\Bbb E} |X| of X is finite. Then \overline{X}_n converges almost surely to {\Bbb E} X, thus {\Bbb P}( \lim_{n \to \infty} \overline{X}_n = {\Bbb E} X ) = 1.

[The concepts of convergence in probability and almost sure convergence in probability theory are specialisations of the concepts of convergence in measure and pointwise convergence almost everywhere in measure theory.]

(If one strengthens the first moment assumption to that of finiteness of the second moment {\Bbb E}|X|^2, then we of course have a more precise statement than the (weak) law of large numbers, namely the central limit theorem, but I will not discuss that theorem here.  With even more hypotheses on X, one similarly has more precise versions of the strong law of large numbers, such as the Chernoff inequality, which I will again not discuss here.)

The weak law is easy to prove, but the strong law (which of course implies the weak law, by the dominated convergence theorem) is more subtle, and in fact the proof of this law (assuming just finiteness of the first moment) usually only appears in advanced graduate texts. So I thought I would present a proof here of both laws, which proceeds by the standard techniques of the moment method and truncation. The emphasis in this exposition will be on motivation and methods rather than brevity and strength of results; there do exist proofs of the strong law in the literature that have been compressed down to the size of one page or less, but this is not my goal here.

Read the rest of this entry »

This week I was in Columbus, Ohio, attending a conference on equidistribution on manifolds. I talked about my recent paper with Ben Green on the quantitative behaviour of polynomial sequences in nilmanifolds, which I have blogged about previously. During my talk (and inspired by the immediately preceding talk of Vitaly Bergelson), I stated explicitly for the first time a generalisation of the van der Corput trick which morally underlies our paper, though it is somewhat buried there as we specialised it to our application at hand (and also had to deal with various quantitative issues that made the presentation more complicated). After the talk, several people asked me for a more precise statement of this trick, so I am presenting it here, and as an application reproving an old theorem of Leon Green that gives a necessary and sufficient condition as to whether a linear sequence (g^n x)_{n=1}^\infty on a nilmanifold G/\Gamma is equidistributed, which generalises the famous theorem of Weyl on equidistribution of polynomials.

Read the rest of this entry »

One of my favourite unsolved problems in mathematics is the Kakeya conjecture in geometric measure theory. This conjecture is descended from the

Kakeya needle problem. (1917) What is the least area in the plane required to continuously rotate a needle of unit length and zero thickness around completely (i.e. by 360^\circ)?

For instance, one can rotate a unit needle inside a unit disk, which has area \pi/4. By using a deltoid one requires only \pi/8 area.

In 1928, Besicovitch showed that that in fact one could rotate a unit needle using an arbitrarily small amount of positive area. This unintuitive fact was a corollary of two observations. The first, which is easy, is that one can translate a needle using arbitrarily small area, by sliding the needle along the direction it points in for a long distance (which costs zero area), turning it slightly (costing a small amount of area), sliding back, and then undoing the turn. The second fact, which is less obvious, can be phrased as follows. Define a Kakeya set in {\Bbb R}^2 to be any set which contains a unit line segment in each direction. (See this Java applet of mine, or the wikipedia page, for some pictures of such sets.)

Theorem. (Besicovitch, 1919) There exists Kakeya sets {\Bbb R}^2 of arbitrarily small area (or more precisely, Lebesgue measure).

In fact, one can construct such sets with zero Lebesgue measure. On the other hand, it was shown by Davies that even though these sets had zero area, they were still necessarily two-dimensional (in the sense of either Hausdorff dimension or Minkowski dimension). This led to an analogous conjecture in higher dimensions:

Kakeya conjecture. A Besicovitch set in {\Bbb R}^n (i.e. a subset of {\Bbb R}^n that contains a unit line segment in every direction) has Minkowski and Hausdorff dimension equal to n.

This conjecture remains open in dimensions three and higher (and gets more difficult as the dimension increases), although many partial results are known. For instance, when n=3, it is known that Besicovitch sets have Hausdorff dimension at least 5/2 and (upper) Minkowski dimension at least 5/2 + 10^{-10}. See my Notices article for a general survey of this problem (and its connections with Fourier analysis, additive combinatorics, and PDE), my paper with Katz for a more technical survey, and Wolff’s survey for a systematic treatment of the field (up to about 1998 or so).

In 1999, Wolff proposed a simpler finite field analogue of the Kakeya conjecture as a model problem that avoided all the technical issues involving Minkowski and Hausdorff dimension. If F^n is a vector space over a finite field F, define a Kakeya set to be a subset of F^n which contains a line in every direction.

Finite field Kakeya conjecture. Let E \subset F^n be a Kakeya set. Then E has cardinality at least c_n |F|^n, where c_n > 0 depends only on n.

This conjecture has had a significant influence in the subject, in particular inspiring work on the sum-product phenomenon in finite fields, which has since proven to have many applications in number theory and computer science. Modulo minor technicalities, the progress on the finite field Kakeya conjecture was, until very recently, essentially the same as that of the original “Euclidean” Kakeya conjecture.

Last week, the finite field Kakeya conjecture was proven using a beautifully simple argument by Zeev Dvir, using the polynomial method in algebraic extremal combinatorics. The proof is so short that I can present it in full here.

Read the rest of this entry »

Einstein’s equation E=mc^2 describing the equivalence of mass and energy is arguably the most famous equation in physics. But his beautifully elegant derivation of this formula (here is the English translation) from previously understood laws of physics is considerably less famous. (There is an amusing Far Side cartoon in this regard, with the punchline “squared away”, which you can find on-line by searching hard enough, though I will not link to it directly.)

This topic had come up in recent discussion on this blog, so I thought I would present Einstein’s derivation here. Actually, to be precise, in the paper mentioned above, Einstein uses the postulates of special relativity and other known laws of physics to show the following:

Proposition. (Mass-energy equivalence) If a body at rest emits a total energy of E while remaining at rest, then the mass of that body decreases by E/c^2.

Assuming that bodies at rest with zero mass necessarily have zero energy, this implies the famous formula E = mc^2 - but only for bodies which are at rest. For moving bodies, there is a similar formula, but one has to first decide what the correct definition of mass is for moving bodies; I will not discuss this issue here, but see for instance the Wikipedia entry on this topic.

Broadly speaking, the derivation of the above proposition proceeds via the following five steps:

  1. Using the postulates of special relativity, determine how space and time coordinates transform under changes of reference frame (i.e. derive the Lorentz transformations).
  2. Using 1., determine how the temporal frequency \nu (and wave number k) of photons transform under changes of reference frame (i.e. derive the formulae for relativistic Doppler shift).
  3. Using Planck’s law E = h\nu (and de Broglie’s law p = \hbar k) and 2., determine how the energy E (and momentum p) of photons transform under changes of reference frame.
  4. Using the law of conservation of energy (and momentum) and 3., determine how the energy (and momentum) of bodies transform under changes of reference frame.
  5. Comparing the results of 4. with the classical Newtonian approximations KE \approx \frac{1}{2} m|v|^2 (and p \approx mv), deduce the relativistic relationship between mass and energy for bodies at rest (and more generally between mass, velocity, energy, and momentum for moving bodies).

Actually, as it turns out, Einstein’s analysis for bodies at rest only needs to understand changes of reference frame at infinitesimally low velocity, |v| \ll c. However, in order to see enough relativistic effects to deduce the mass-energy equivalence, one needs to obtain formulae which are accurate to second order in v (or more precisely, v/c), as opposed to those in Newtonian physics which are accurate to first order in v. Also, to understand the relationship between mass, velocity, energy, and momentum for moving bodies rather than bodies at rest, one needs to consider non-infinitesimal changes of reference frame.

Important note: Einstein’s argument is, of course, a physical argument rather than a mathematical one. While I will use the language and formalism of pure mathematics here, it should be emphasised that I am not exactly giving a formal proof of the above Proposition in the sense of modern mathematics; these arguments are instead more like the classical proofs of Euclid, in that numerous “self evident” assumptions about space, time, velocity, etc. will be made along the way. (Indeed, there is a very strong analogy between Euclidean geometry and the Minkowskian geometry of special relativity.) One can of course make these assumptions more explicit, and this has been done in many other places, but I will avoid doing so here in order not to overly obscure Einstein’s original argument. Read the rest of this entry »

In the previous post, I discussed how an induction on dimension approach could establish Hilbert’s nullstellensatz, which we interpreted as a result describing all the obstructions to solving a system of polynomial equations and inequations over an algebraically closed field. Today, I want to point out that exactly the same approach also gives the Hahn-Banach theorem (at least in finite dimensions), which we interpret as a result describing all the obstructions to solving a system of linear inequalities over the reals (or in other words, a linear programming problem); this formulation of the Hahn-Banach theorem is sometimes known as Farkas’ lemma. Then I would like to discuss some standard applications of the Hahn-Banach theorem, such as the separation theorem of Dieudonné, the minimax theorem of von Neumann, Menger’s theorem, and Helly’s theorem (which was mentioned recently in an earlier post).

Read the rest of this entry »

I had occasion recently to look up the proof of Hilbert’s nullstellensatz, which I haven’t studied since cramming for my algebra qualifying exam as a graduate student. I was a little unsatisfied with the proofs I was able to locate - they were fairly abstract and used a certain amount of algebraic machinery, which I was terribly rusty on - so, as an exercise, I tried to find a more computational proof that avoided as much abstract machinery as possible. I found a proof which used only the extended Euclidean algorithm and high school algebra, together with an induction on dimension and the obvious observation that any non-zero polynomial of one variable on an algebraically closed field has at least one non-root. It probably isn’t new (in particular, it might be related to the standard model-theoretic proof of the nullstellensatz, with the Euclidean algorithm and high school algebra taking the place of quantifier elimination), but I thought I’d share it here anyway.

Throughout this post, F is going to be a fixed algebraically closed field (e.g. the complex numbers {\Bbb C}). I’d like to phrase the nullstellensatz in a fairly concrete fashion, in terms of the problem of solving a set of simultaneous polynomial equations P_1(x) = \ldots = P_m(x) = 0 in several variables x = (x_1,\ldots,x_d) \in F^d over F, thus P_1,\ldots,P_m \in F[x] are polynomials in d variables. One obvious obstruction to solvability of this system is if the equations one is trying to solve are inconsistent in the sense that they can be used to imply 1=0. In particular, if one can find polynomials Q_1,\ldots,Q_m \in F[x] such that P_1 Q_1 + \ldots + P_m Q_m = 1, then clearly one cannot solve P_1(x)=\ldots=P_m(x)=0. The weak nullstellensatz asserts that this is, in fact, the only obstruction:

Weak nullstellensatz. Let P_1,\ldots,P_m \in F[x] be polynomials. Then exactly one of the following statements holds:

  1. The system of equations P_1(x)=\ldots=P_m(x)=0 has a solution x \in F^d.
  2. There exist polynomials Q_1,\ldots,Q_m \in F[x] such that P_1 Q_1 + \ldots + P_m Q_m = 1.

Note that the hypothesis that F is algebraically closed is crucial; for instance, if F is the reals, then the equation x^2+1=0 has no solution, but there is no polynomial Q(x) such that (x^2+1) Q(x) = 1.

Like many results of the “The only obstructions are the obvious obstructions” type, the power of the nullstellensatz lies in the ability to take a hypothesis about non-existence (in this case, non-existence of solutions to P_1(x)=\ldots=P_m(x)=0) and deduce a conclusion about existence (in this case, existence of Q_1,\ldots,Q_m such that P_1 Q_1 + \ldots + P_m Q_m = 1). The ability to get “something from nothing” is clearly going to be both non-trivial and useful. In particular, the nullstellensatz offers an important correspondence between algebraic geometry (the conclusion 1 is an assertion that a certain algebraic variety is empty) and commutative algebra (the conclusion 2 is an assertion that a certain ideal is non-proper).

Now suppose one is trying to solve the more complicated system P_1(x)=\ldots=P_d(x)=0; R(x) \neq 0 for some polynomials P_1,\ldots,P_d, R. Again, any identity of the form P_1 Q_1 + \ldots + P_m Q_m = 1 will be an obstruction to solvability, but now more obstructions are possible: any identity of the form P_1 Q_1 + \ldots + P_m Q_m = R^r for some non-negative integer r will also obstruct solvability. The strong nullstellensatz asserts that this is the only obstruction:

Strong nullstellensatz. Let P_1,\ldots,P_m, R \in F[x] be polynomials. Then exactly one of the following statements holds:

  1. The system of equations P_1(x)=\ldots=P_m(x)=0, R(x) \neq 0 has a solution x \in F^d.
  2. There exist polynomials Q_1,\ldots,Q_m \in F[x] and a non-negative integer r such that P_1 Q_1 + \ldots + P_m Q_m = R^r.

Of course, the weak nullstellensatz corresponds to the special case in which R=1. The strong nullstellensatz is usually phrased instead in terms of ideals and radicals, but the above formulation is easily shown to be equivalent to the usual version (modulo Hilbert’s basis theorem).

One could consider generalising the nullstellensatz a little further by considering systems of the form P_1(x)=\ldots=P_m(x)=0, R_1(x),\ldots,R_n(x) \neq 0, but this is not a significant generalisation, since all the inequations R_1(x) \neq 0, \ldots, R_n(x) \neq 0 can be concatenated into a single inequation R_1(x) \ldots R_n(x) \neq 0. The presence of the exponent r in conclusion (2) is a little annoying; to get rid of it, one needs to generalise the notion of an algebraic variety to that of a scheme (which is worth doing for several other reasons too, in particular one can now work over much more general objects than just algebraically closed fields), but that is a whole story in itself (and one that I am not really qualified to tell).

[Update, Nov 26: It turns out that my approach is more complicated than I first thought, and so I had to revise the proof quite a bit to fix a certain gap, in particular making it significantly messier than my first version. On the plus side, I was able to at least eliminate any appeal to the Hilbert's basis theorem, so in particular the proof is now manifestly effective (but with terrible bounds). In any case, I am keeping the argument here in case it has some interest.]

Read the rest of this entry »

Today I’d like to discuss (part of) a cute and surprising theorem of Fritz John in the area of non-linear wave equations, and specifically for the equation

\partial_{tt} u - \Delta u = |u|^p (1)

where u: {\Bbb R} \times {\Bbb R}^3 \to {\Bbb R} is a scalar function of one time and three spatial dimensions.

The evolution of this type of non-linear wave equation can be viewed as a “race” between the dispersive tendency of the linear wave equation

\partial_{tt} u - \Delta u = 0 (2)

and the positive feedback tendencies of the nonlinear ODE

\partial_{tt} u = |u|^p. (3)

More precisely, solutions to (2) tend to decay in time as t \to +\infty, as can be seen from the presence of the \frac{1}{t} term in the explicit formula

u(t,x) =  \frac{1}{4\pi t} \int_{|y-x|=t} \partial_t u(0,y)\ dS(y) + \partial_t[\frac{1}{4\pi t} \int_{|y-x|=t} u(0,y)\ dS(y)], (4)

for such solutions in terms of the initial position u(0,y) and initial velocity \partial_t u(0,y), where t > 0, x \in {\Bbb R}^3, and dS is the area element of the sphere \{ y \in {\Bbb R}^3: |y-x|=t \}. (For this post I will ignore the technical issues regarding how smooth the solution has to be in order for the above formula to be valid.) On the other hand, solutions to (3) tend to blow up in finite time from data with positive initial position and initial velocity, even if this data is very small, as can be seen by the family of solutions

u_T(t,x) := c (T-t)^{-2/(p-1)}

for T > 0, 0 < t < T, and x \in {\Bbb R}^3, where c is the positive constant c := (\frac{2(p+1)}{(p-1)^2})^{1/(p-1)}. For T large, this gives a family of solutions which starts out very small at time zero, but still manages to go to infinity in finite time.

The equation (1) can be viewed as a combination of equations (2) and (3) and should thus inherit a mix of the behaviours of both its “parents”. As a general rule, when the initial data u(0,\cdot), \partial_t u(0,\cdot) of solution is small, one expects the dispersion to “win” and send the solution to zero as t \to \infty, because the nonlinear effects are weak; conversely, when the initial data is large, one expects the nonlinear effects to “win” and cause blowup, or at least large amounts of instability. This division is particularly pronounced when p is large (since then the nonlinearity is very strong for large data and very weak for small data), but not so much for p small (for instance, when p=1, the equation becomes essentially linear, and one can easily show that blowup does not occur from reasonable data.)

The theorem of John formalises this intuition, with a remarkable threshold value for p:

Theorem. Let 1 < p < \infty.

  1. If p < 1+\sqrt{2}, then there exist solutions which are arbitrarily small (both in size and in support) and smooth at time zero, but which blow up in finite time.
  2. If p > 1+\sqrt{2}, then for every initial data which is sufficiently small in size and support, and sufficiently smooth, one has a global solution (which goes to zero uniformly as t \to \infty).

[At the critical threshold p = 1 + \sqrt{2} one also has blowup from arbitrarily small data, as was shown subsequently by Schaeffer.]

The ostensible purpose of this post is to try to explain why the curious exponent 1+\sqrt{2} should make an appearance here, by sketching out the proof of part 1 of John’s theorem (I will not discuss part 2 here); but another reason I am writing this post is to illustrate how to make quick “back-of-the-envelope” calculations in harmonic analysis and PDE which can obtain the correct numerology for such a problem much faster than a fully rigorous approach. These calculations can be a little tricky to handle properly at first, but with practice they can be done very swiftly.

Read the rest of this entry »

I’ve just come back from the 48th Annual IEEE Symposium on the Foundations of Computer science, better known as FOCS; this year it was held at Providence, near Brown University. (This conference is also being officially reported on by the blog posts of Nicole Immorlica, Luca Trevisan, and Scott Aaronson.) I was there to give a tutorial on some of the tools used these days in additive combinatorics and graph theory to distinguish structure and randomness. In a previous blog post, I had already mentioned that my lecture notes for this were available on the arXiv; now the slides for my tutorial are available too (it covers much the same ground as the lecture notes, and also incorporates some material from my ICM slides, but in a slightly different format).

In the slides, I am tentatively announcing some very recent (and not yet fully written up) work of Ben Green and myself establishing the Gowers inverse conjecture in finite fields in the special case when the function f is a bounded degree polynomial (this is a case which already has some theoretical computer science applications). I hope to expand upon this in a future post. But I will describe here a neat trick I learned at the conference (from the FOCS submission of Bogdanov and Viola) which uses majority voting to enhance a large number of small independent correlations into a much stronger single correlation. This application of majority voting is widespread in computer science (and, of course, in real-world democracies), but I had not previously been aware of its utility to the type of structure/randomness problems I am interested in (in particular, it seems to significantly simplify some of the arguments in the proof of my result with Ben mentioned above); thanks to this conference, I now know to add majority voting to my “toolbox”.

Read the rest of this entry »

In one of my recent posts, I used the Jordan normal form for a matrix in order to justify a couple of arguments. As a student, I learned the derivation of this form twice: firstly (as an undergraduate) by using the minimal polynomial, and secondly (as a graduate) by using the structure theorem for finitely generated modules over a principal ideal domain. I found though that the former proof was too concrete and the latter proof too abstract, and so I never really got a good intuition on how the theorem really worked. So I went back and tried to synthesise a proof that I was happy with, by taking the best bits of both arguments that I knew. I ended up with something which wasn’t too different from the standard proofs (relying primarily on the (extended) Euclidean algorithm and the fundamental theorem of algebra), but seems to get at the heart of the matter fairly quickly, so I thought I’d put it up on this blog anyway.

Before we begin, though, let us recall what the Jordan normal form theorem is. For this post, I’ll take the perspective of abstract linear transformations rather than of concrete matrices. Let T: V \to V be a linear transformation on a finite dimensional complex vector space V, with no preferred coordinate system. We are interested in asking what possible “kinds” of linear transformations V can support (more technically, we want to classify the conjugacy classes of \hbox{End}(V), the ring of linear endomorphisms of V to itself). Here are some simple examples of linear transformations.

  1. The right shift. Here, V = {\Bbb R}^n is a standard vector space, and the right shift U: V \to V is defined as U(x_1,\ldots,x_n) = (0,x_1,\ldots,x_{n-1}), thus all elements are shifted right by one position. (For instance, the 1-dimensional right shift is just the zero operator.)
  2. The right shift plus a constant. Here we consider an operator U + \lambda I, where U: V \to V is a right shift, I is the identity on V, and \lambda \in {\Bbb C} is a complex number.
  3. Direct sums. Given two linear transformations T: V \to V and S: W \to W, we can form their direct sum T \oplus S: V \oplus W \to V \oplus W by the formula (T \oplus S)(v,w) := (Tv, Sw).

Our objective is then to prove the

Jordan normal form theorem. Every linear transformation T: V \to V on a finite dimensional complex vector space V is similar to a direct sum of transformations, each of which is a right shift plus a constant.

(Of course, the same theorem also holds with left shifts instead of right shifts.)

Read the rest of this entry »

In my discussion of the Oppenheim conjecture in my recent post on Ratner’s theorems, I mentioned in passing the simple but crucial fact that the (orthochronous) special orthogonal group SO(Q)^+ of an indefinite quadratic form on {\Bbb R}^3 can be generated by unipotent elements. This is not a difficult fact to prove, as one can simply diagonalise Q and then explicitly write down some unipotent elements (the magic words here are “null rotations“). But this is a purely algebraic approach; I thought it would also be instructive to show the geometric (or dynamic) reason for why unipotent elements appear in the orthogonal group of indefinite quadratic forms in three dimensions. (I’ll give away the punch line right away: it’s because the parabola is a conic section.) This is not a particularly deep or significant observation, and will not be surprising to the experts, but I would like to record it anyway, as it allows me to review some useful bits and pieces of elementary linear algebra.

Read the rest of this entry »

While working on my recent paper with Ben Green, I was introduced to the beautiful theorems of Marina Ratner on unipotent flows on homogeneous spaces, and their application to questions in number theory, such as the Oppenheim conjecture (first solved by Margulis, by establishing what can retrospectively be viewed as a special case of Ratner’s theorems). This is a subject that I am still only just beginning to learn, but hope to understand better in the future, especially given that quantitative analogues of Ratner’s theorems should exist, and should have even more applications to number theory (see for instance this recent paper of Einsiedler, Margulis, and Venkatesh). In this post, I will try to describe some of the background for this theorem and its connection with the Oppenheim conjecture; I will not discuss the proof at all, largely because I have not fully understood it myself yet. For a nice introduction to these issues, I recommend Dave Morris’ recent book on the subject (and this post here is drawn in large part from that book).

Ratner’s theorem takes place on a homogeneous space. Informally, a homogeneous space is a space X which looks “the same” when viewed from any point on that space. For instance, a sphere S^2 is a homogeneous space, but the surface of a cube is not (the cube looks different when viewed from a corner than from a point on an edge or on a face). More formally, a homogeneous space is a space X equipped with an action (g,x) \mapsto gx of a group G of symmetries which is transitive: given any two points x, y on the space, there is at least one symmetry g that moves x to y, thus y=gx. (For instance the cube has several symmetries, but not enough to be transitive; in contrast, the sphere S^2 has the transitive action of the special orthogonal group SO(3) as its symmetry group.) It is not hard to see that a homogeneous space X can always be identified (as a set with an action of G) with a quotient G/\Gamma := \{ g\Gamma: g \in G \}, where \Gamma is a subgroup of G; indeed, one can take \Gamma to be the stabiliser \Gamma := \{ g \in G: gx = x \} of an arbitrarily chosen point x in X, and then identify g\Gamma with g\Gamma x = gx. For instance, the sphere S^2 has an obvious action of the special orthogonal group SO(3), and the stabiliser of (say) the north pole can be identified with SO(2), so that the sphere can be identified with SO(3)/SO(2). More generally, any Riemannian manifold of constant curvature is a homogeneous space; for instance, an m-dimensional torus can be identified with {\Bbb R}^m/{\Bbb Z}^m, while a surface X of constant negative curvature can be identified with SL(2,{\Bbb R})/\Gamma for some subgroup \Gamma of SL(2,{\Bbb R}) (e.g. the hyperbolic plane {\Bbb H} is isomorphic to SL(2,{\Bbb R})/SO(2)). Furthermore, the cosphere bundle S^* X of X - the space of unit (co)tangent vectors on X - is also a homogeneous space with structure group SL(2,{\Bbb R}). (For instance, the cosphere bundle S^* {\Bbb H} of the hyperbolic plane {\Bbb H} is isomorphic to SL(2,{\Bbb R}) / \{ +1, -1 \}.)

For the purposes of Ratner’s theorem, we only consider homogeneous spaces X in which the symmetry group G is a connected finite-dimensional Lie group, and X is finite volume (or more precisely, it has a finite non-trivial G-invariant measure). Every compact homogeneous space is finite volume, but not conversely; for instance the modular curve SL(2,{\Bbb R})/SL(2,{\Bbb Z}) is finite volume but not compact (it has a cusp). (The modular curve has two real dimensions, but just one complex dimension, hence the term “curve”; rather confusingly, it is also referred to as the “modular surface”. As for the term “modular”, observe that the moduli space of unimodular lattices in {\Bbb R}^2 has an obvious action of SL(2,{\Bbb R}), with the stabiliser of {\Bbb Z}^2 being SL(2,{\Bbb Z}), and so this moduli space can be identified with the modular curve.)

Read the rest of this entry »

Today I’d like to discuss a beautiful inequality in graph theory, namely the crossing number inequality. This inequality gives a useful bound on how far a given graph is from being planar, and has a number of applications, for instance to sum-product estimates. Its proof is also an excellent example of the amplification trick in action; here the main source of amplification is the freedom to pass to subobjects, which is a freedom which I didn’t touch upon in the previous post on amplification. The crossing number inequality (and its proof) is well known among graph theorists but perhaps not among the wider mathematical community, so I thought I would publicise it here.

In this post, when I talk about a graph, I mean an abstract collection of vertices V, together with some abstract edges E joining pairs of vertices together. We will assume that the graph is undirected (the edges do not have a preferred orientation), loop-free (an edge cannot begin and start at the same vertex), and multiplicity-free (any pair of vertices is joined by at most one edge). More formally, we can model all this by viewing E as a subset of \binom{V}{2} := \{ e \subset V: |e|=2 \}, the set of 2-element subsets of V, and we view the graph G as an ordered pair G = (V,E). (The notation is set up so that |\binom{V}{2}| = \binom{|V|}{2}.)

Now one of the great features of graphs, as opposed to some other abstract maths concepts, is that they are easy to draw: the abstract vertices become dots on a plane, while the edges become line segments or curves connecting these dots. [To avoid some technicalities we do not allow these curves to pass through the dots, except if the curve is terminating at that dot.] Let us informally refer to such a concrete representation D of a graph G as a drawing of that graph. Clearly, any non-trivial graph is going to have an infinite number of possible drawings. In some of these drawings, a pair of edges might cross each other; in other drawings, all edges might be disjoint (except of course at the vertices, where edges with a common endpoint are obliged to meet). If G has a drawing D of the latter type, we say that the graph G is planar.

Given an abstract graph G, or a drawing thereof, it is not always obvious as to whether that graph is planar; just because the drawing that you currently possess of G contains crossings, does not necessarily mean that all drawings of G do. The wonderful little web game “Planarity” illustrates this point excellently. Nevertheless, there are definitely graphs which are not planar; in particular the complete graph K_5 on five vertices, and the complete bipartite graph K_{3,3} on two sets of three vertices, are non-planar.

There is in fact a famous theorem of Kuratowski that says that these two graphs are the only “source” of non-planarity, in the sense that any non-planar graph contains (a subdivision of) one of these graphs as a subgraph. (There is of course the even more famous four-colour theorem that asserts that every planar graphs is four-colourable, but this is not the topic of my post today.)

Intuitively, if we fix the number of vertices |V|, and increase the number of edges |E|, then the graph should become “increasingly non-planar”; conversely, if we keep the same number of edges |E| but spread them amongst a greater number of vertices |V|, then the graph should become “increasingly planar”. Is there a quantitative way to measure the “non-planarity” of a graph, and to formalise the above intuition as some sort of inequality?
Read the rest of this entry »

My colleague Ricardo Pérez-Marco showed me a very cute proof of Pythagoras’ theorem, which I thought I would share here; it’s not particularly earth-shattering, but it is perhaps the most intuitive proof of the theorem that I have seen yet.

In the above diagram, a, b, c are the lengths BC, CA, and AB of the right-angled triangle ACB, while x and y are the areas of the right-angled triangles ADC and CDB respectively. Thus the whole triangle ACB has area x+y.

Now observe that the right-angled triangles ADC, CDB, and ACB are all similar (because of all the common angles), and thus their areas are proportional to the square of their respective hypotenuses. In other words, (x,y,x+y) is proportional to (a^2, b^2, c^2). Pythagoras’ theorem follows.

Read the rest of this entry »

It occurred to me recently that the mathematical blog medium may be a good venue not just for expository “short stories” on mathematical concepts or results, but also for more technical discussions of individual mathematical “tricks”, which would otherwise not be significant enough to warrant a publication-length (and publication-quality) article. So I thought today that I would discuss the amplification trick in harmonic analysis and combinatorics (and in particular, in the study of estimates); this trick takes an established estimate involving an arbitrary object (such as a function f), and obtains a stronger (or amplified) estimate by transforming the object in a well-chosen manner (often involving some new parameters) into a new object, applying the estimate to that new object, and seeing what that estimate says about the original object (after optimising the parameters or taking a limit). The amplification trick works particularly well for estimates which enjoy some sort of symmetry on one side of the estimate that is not represented on the other side; indeed, it can be viewed as a way to “arbitrage” differing amounts of symmetry between the left- and right-hand sides of an estimate. It can also be used in the contrapositive, amplifying a weak counterexample to an estimate into a strong counterexample. This trick also sheds some light as to why dimensional analysis works; an estimate which is not dimensionally consistent can often be amplified into a stronger estimate which is dimensionally consistent; in many cases, this new estimate is so strong that it cannot in fact be true, and thus dimensionally inconsistent inequalities tend to be either false or inefficient, which is why we rarely see them. (More generally, any inequality on which a group acts on either the left or right-hand side can often be “decomposed” into the “isotypic components” of the group action, either by the amplification trick or by other related tools, such as Fourier analysis.)

The amplification trick is a deceptively simple one, but it can become particularly powerful when one is arbitraging an unintuitive symmetry, such as symmetry under tensor powers. Indeed, the “tensor power trick”, which can eliminate constants and even logarithms in an almost magical manner, can lead to some interesting proofs of sharp inequalities, which are difficult to establish by more direct means.

The most familiar example of the amplification trick in action is probably the textbook proof of the Cauchy-Schwarz inequality

|< v, w >| \leq \|v\| \|w\| (1)

for vectors v, w in a complex Hilbert space. To prove this inequality, one might start by exploiting the obvious inequality

\|v-w\|^2 \geq 0 (2)

but after expanding everything out, one only gets the weaker inequality

\hbox{Re} < v, w > \leq \frac{1}{2} \|v\|^2 + \frac{1}{2} \|w\|^2. (3)

Now (3) is weaker than (1) for two reasons; the left-hand side is smaller, and the right-hand side is larger (thanks to the arithmetic mean-geometric mean inequality). However, we can amplify (3) by arbitraging some symmetry imbalances. Firstly, observe that the phase rotation symmetry v \mapsto e^{i\theta} v preserves the RHS of (3) but not the LHS. We exploit this by replacing v by e^{i\theta} v in (3) for some phase \theta to be chosen later, to obtain

\hbox{Re} e^{i\theta} < v, w > \leq \frac{1}{2} \|v\|^2 + \frac{1}{2} \|w\|^2.

Now we are free to choose \theta at will (as long as it is real, of course), so it is natural to choose \theta to optimise the inequality, which in this case means to make the left-hand side as large as possible. This is achieved by choosing e^{i\theta} to cancel the phase of < v, w >, and we obtain

|< v, w >| \leq \frac{1}{2} \|v\|^2 + \frac{1}{2} \|w\|^2 (4)

This is closer to (1); we have fixed the left-hand side, but the right-hand side is still too weak. But we can amplify further, by exploiting an imbalance in a different symmetry, namely the homogenisation symmetry (v,w) \mapsto (\lambda v, \frac{1}{\lambda} w) for a scalar \lambda > 0, which preserves the left-hand side but not the right. Inserting this transform into (4) we conclude that

|< v, w >| \leq \frac{\lambda^2}{2} \|v\|^2 + \frac{1}{2\lambda^2} \|w\|^2

where \lambda > 0 is at our disposal to choose. We can optimise in \lambda by minimising the right-hand side, and indeed one easily sees that the minimum (or infimum, if one of v and w vanishes) is \|v\| \|w\| (which is achieved when