You are currently browsing the category archive for the ‘Mathematics’ category.
My colleague Tom Liggett recently posed to me the following problem about power series in one real variable . Observe that the power series
has very rapidly decaying coefficients (of order ), leading to an infinite radius of convergence; also, as the series converges to , the series decays very rapidly as approaches . The problem is whether this is essentially the only example of this type. More precisely:
Problem 1 Let be a bounded sequence of real numbers, and suppose that the power series
(which has an infinite radius of convergence) decays like as , in the sense that the function remains bounded as . Must the sequence be of the form for some constant ?
As it turns out, the problem has a very nice solution using complex analysis methods, which by coincidence I happen to be teaching right now. I am therefore posing as a challenge to my complex analysis students and to other readers of this blog to answer the above problem by complex methods; feel free to post solutions in the comments below (and in particular, if you don’t want to be spoiled, you should probably refrain from reading the comments). In fact, the only way I know how to solve this problem currently is by complex methods; I would be interested in seeing a purely real-variable solution that is not simply a thinly disguised version of a complex-variable argument.
(To be fair to my students, the complex variable argument does require one additional tool that is not directly covered in my notes. That tool can be found here.)
In the previous set of notes we saw that functions that were holomorphic on an open set enjoyed a large number of useful properties, particularly if the domain was simply connected. In many situations, though, we need to consider functions that are only holomorphic (or even well-defined) on most of a domain , thus they are actually functions outside of some small singular set inside . (In this set of notes we only consider interior singularities; one can also discuss singular behaviour at the boundary of , but this is a whole separate topic and will not be pursued here.) Since we have only defined the notion of holomorphicity on open sets, we will require the singular sets to be closed, so that the domain on which remains holomorphic is still open. A typical class of examples are the functions of the form that were already encountered in the Cauchy integral formula; if is holomorphic and , such a function would be holomorphic save for a singularity at . Another basic class of examples are the rational functions , which are holomorphic outside of the zeroes of the denominator .
Singularities come in varying levels of “badness” in complex analysis. The least harmful type of singularity is the removable singularity – a point which is an isolated singularity (i.e., an isolated point of the singular set ) where the function is undefined, but for which one can extend the function across the singularity in such a fashion that the function becomes holomorphic in a neighbourhood of the singularity. A typical example is that of the complex sinc function , which has a removable singularity at the origin , which can be removed by declaring the sinc function to equal at . The detection of isolated removable singularities can be accomplished by Riemann’s theorem on removable singularities (Exercise 35 from Notes 3): if a holomorphic function is bounded near an isolated singularity , then the singularity at may be removed.
After removable singularities, the mildest form of singularity one can encounter is that of a pole – an isolated singularity such that can be factored as for some (known as the order of the pole), where has a removable singularity at (and is non-zero at once the singularity is removed). Such functions have already made a frequent appearance in previous notes, particularly the case of simple poles when . The behaviour near of function with a pole of order is well understood: for instance, goes to infinity as approaches (at a rate comparable to ). These singularities are not, strictly speaking, removable; but if one compactifies the range of the holomorphic function to a slightly larger space known as the Riemann sphere, then the singularity can be removed. In particular, functions which only have isolated singularities that are either poles or removable can be extended to holomorphic functions to the Riemann sphere. Such functions are known as meromorphic functions, and are nearly as well-behaved as holomorphic functions in many ways. In fact, in one key respect, the family of meromorphic functions is better: the meromorphic functions on turn out to form a field, in particular the quotient of two meromorphic functions is again meromorphic (if the denominator is not identically zero).
Unfortunately, there are isolated singularities that are neither removable or poles, and are known as essential singularities. A typical example is the function , which turns out to have an essential singularity at . The behaviour of such essential singularities is quite wild; we will show here the Casorati-Weierstrass theorem, which shows that the image of near the essential singularity is dense in the complex plane, as well as the more difficult great Picard theorem which asserts that in fact the image can omit at most one point in the complex plane. Nevertheless, around any isolated singularity (even the essential ones) , it is possible to expand as a variant of a Taylor series known as a Laurent series . The coefficient of this series is particularly important for contour integration purposes, and is known as the residue of at the isolated singularity . These residues play a central role in a common generalisation of Cauchy’s theorem and the Cauchy integral formula known as the residue theorem, which is a particularly useful tool for computing (or at least transforming) contour integrals of meromorphic functions, and has proven to be a particularly popular technique to use in analytic number theory. Within complex analysis, one important consequence of the residue theorem is the argument principle, which gives a topological (and analytical) way to control the zeroes and poles of a meromorphic function.
Finally, there are the non-isolated singularities. Little can be said about these singularities in general (for instance, the residue theorem does not directly apply in the presence of such singularities), but certain types of non-isolated singularities are still relatively easy to understand. One particularly common example of such non-isolated singularity arises when trying to invert a non-injective function, such as the complex exponential or a power function , leading to branches of multivalued functions such as the complex logarithm or the root function respectively. Such branches will typically have a non-isolated singularity along a branch cut; this branch cut can be moved around the complex domain by switching from one branch to another, but usually cannot be eliminated entirely, unless one is willing to lift up the domain to a more general type of domain known as a Riemann surface. As such, one can view branch cuts as being an “artificial” form of singularity, being an artefact of a choice of local coordinates of a Riemann surface, rather than reflecting any intrinsic singularity of the function itself. The further study of Riemann surfaces is an important topic in complex analysis (as well as the related fields of complex geometry and algebraic geometry), but unfortunately this topic will probably be postponed to the next course in this sequence (which I will not be teaching).
TWe now come to perhaps the most central theorem in complex analysis (save possibly for the fundamental theorem of calculus), namely Cauchy’s theorem, which allows one to compute (or at least transform) a large number of contour integrals even without knowing any explicit antiderivative of . There are many forms and variants of Cauchy’s theorem. To give one such version, we need the basic topological notion of a homotopy:
Definition 1 (Homotopy) Let be an open subset of , and let , be two curves in .
- (i) If have the same initial point and final point , we say that and are homotopic with fixed endpoints in if there exists a continuous map such that and for all , and such that and for all .
- (ii) If are closed (but possibly with different initial points), we say that and are homotopic as closed curves in if there exists a continuous map such that and for all , and such that for all .
- (iii) If and are curves with the same initial point and same final point, we say that and are homotopic with fixed endpoints up to reparameterisation in if there is a reparameterisation of which is homotopic with fixed endpoints in to a reparameterisation of .
- (iv) If and are closed curves, we say that and are homotopic as closed curves up to reparameterisation in if there is a reparameterisation of which is homotopic as closed curves in to a reparameterisation of .
In the first two cases, the map will be referred to as a homotopy from to , and we will also say that can be continously deformed to (either with fixed endpoints, or as closed curves).
For a similar reason, in a convex open set , any two closed curves will be homotopic to each other as closed curves.
- (i) Prove that the property of being homotopic with fixed endpoints in is an equivalence relation.
- (ii) Prove that the property of being homotopic as closed curves in is an equivalence relation.
- (iii) If are closed curves with the same initial point, show that is homotopic to as closed curves if and only if is homotopic to with fixed endpoints for some closed curve with the same initial point as or .
- (iv) Define a point in to be a curve of the form for some and all . Let be a closed curve in . Show that is homotopic with fixed endpoints to a point in if and only if is homotopic as a closed curve to a point in . (In either case, we will call homotopic to a point, null-homotopic, or contractible to a point in .)
- (v) If are curves with the same initial point and the same terminal point, show that is homotopic to with fixed endpoints in if and only if is homotopic to a point in .
- (vi) If is connected, and are any two curves in , show that there exists a continuous map such that and for all . Thus the notion of homotopy becomes rather trivial if one does not fix the endpoints or require the curve to be closed.
- (vii) Show that if is a reparameterisation of , then and are homotopic with fixed endpoints in U.
- (viii) Prove that the property of being homotopic with fixed endpoints in up to reparameterisation is an equivalence relation.
- (ix) Prove that the property of being homotopic as closed curves in up to reparameterisation is an equivalence relation.
We can then phrase Cauchy’s theorem as an assertion that contour integration on holomorphic functions is a homotopy invariant. More precisely:
- (i) If and are rectifiable curves that are homotopic in with fixed endpoints up to reparameterisation, then
- (ii) If and are closed rectifiable curves that are homotopic in as closed curves up to reparameterisation, then
This version of Cauchy’s theorem is particularly useful for applications, as it explicitly brings into play the powerful technique of contour shifting, which allows one to compute a contour integral by replacing the contour with a homotopic contour on which the integral is easier to either compute or integrate. This formulation of Cauchy’s theorem also highlights the close relationship between contour integrals and the algebraic topology of the complex plane (and open subsets thereof). Setting to be a point, we obtain an important special case of Cauchy’s theorem (which is in fact equivalent to the full theorem):
An important feature to note about Cauchy’s theorem is the global nature of its hypothesis on . The conclusion of Cauchy’s theorem only involves the values of a function on the images of the two curves . However, in order for the hypotheses of Cauchy’s theorem to apply, the function must be holomorphic not only on the images on , but on an open set that is large enough (and sufficiently free of “holes”) to support a homotopy between the two curves. This point can be emphasised through the following fundamental near-counterexample to Cauchy’s theorem:
As a consequence of this and Cauchy’s theorem, we conclude that the contour is not contractible to a point in ; note that this does not contradict Example 2 because is not convex. Thus we see that the lack of holomorphicity (or singularity) of at the origin can be “blamed” for the non-vanishing of the integral of on the closed contour , even though this contour does not come anywhere near the origin. Thus we see that the global behaviour of , not just the behaviour in the local neighbourhood of , has an impact on the contour integral.
One can of course rewrite this example to involve non-closed contours instead of closed ones. For instance, if we let denote the half-circle contours and , then are both contours in from to , but one has
In order for this to be consistent with Cauchy’s theorem, we conclude that and are not homotopic in (even after reparameterisation).
In the specific case of functions of the form , or more generally for some point and some that is holomorphic in some neighbourhood of , we can quantify the precise failure of Cauchy’s theorem through the Cauchy integral formula, and through the concept of a winding number. These turn out to be extremely powerful tools for understanding both the nature of holomorphic functions and the topology of open subsets of the complex plane, as we shall see in this and later notes.
Having discussed differentiation of complex mappings in the preceding notes, we now turn to the integration of complex maps. We first briefly review the situation of integration of (suitably regular) real functions of one variable. Actually there are three closely related concepts of integration that arise in this setting:
- (i) The signed definite integral , which is usually interpreted as the Riemann integral (or equivalently, the Darboux integral), which can be defined as the limit (if it exists) of the Riemann sums
where is some partition of , is an element of the interval , and the limit is taken as the maximum mesh size goes to zero. It is convenient to adopt the convention that for ; alternatively one can interpret as the limit of the Riemann sums (1), where now the (reversed) partition goes leftwards from to , rather than rightwards from to .
- (ii) The unsigned definite integral , usually interpreted as the Lebesgue integral. The precise definition of this integral is a little complicated (see e.g. this previous post), but roughly speaking the idea is to approximate by simple functions for some coefficients and sets , and then approximate the integral by the quantities , where is the Lebesgue measure of . In contrast to the signed definite integral, no orientation is imposed or used on the underlying domain of integration, which is viewed as an “undirected” set .
- (iii) The indefinite integral or antiderivative , defined as any function whose derivative exists and is equal to on . Famously, the antiderivative is only defined up to the addition of an arbitrary constant , thus for instance .
There are some other variants of the above integrals (e.g. the Henstock-Kurzweil integral, discussed for instance in this previous post), which can handle slightly different classes of functions and have slightly different properties than the standard integrals listed here, but we will not need to discuss such alternative integrals in this course (with the exception of some improper and principal value integrals, which we will encounter in later notes).
The above three notions of integration are closely related to each other. For instance, if is a Riemann integrable function, then the signed definite integral and unsigned definite integral coincide (when the former is oriented correctly), thus
If is continuous, then by the fundamental theorem of calculus, it possesses an antiderivative , which is well defined up to an additive constant , and
for any , thus for instance and .
All three of the above integration concepts have analogues in complex analysis. By far the most important notion will be the complex analogue of the signed definite integral, namely the contour integral , in which the directed line segment from one real number to another is now replaced by a type of curve in the complex plane known as a contour. The contour integral can be viewed as the special case of the more general line integral , that is of particular relevance in complex analysis. There are also analogues of the Lebesgue integral, namely the arclength measure integrals and the area integrals , but these play only an auxiliary role in the subject. Finally, we still have the notion of an antiderivative (also known as a primitive) of a complex function .
As it turns out, the fundamental theorem of calculus continues to hold in the complex plane: under suitable regularity assumptions on a complex function and a primitive of that function, one has
whenever is a contour from to that lies in the domain of . In particular, functions that possess a primitive must be conservative in the sense that for any closed contour. This property of being conservative is not typical, in that “most” functions will not be conservative. However, there is a remarkable and far-reaching theorem, the Cauchy integral theorem (also known as the Cauchy-Goursat theorem), which asserts that any holomorphic function is conservative, so long as the domain is simply connected (or if one restricts attention to contractible closed contours). We will explore this theorem and several of its consequences the next set of notes.
At the core of almost any undergraduate real analysis course are the concepts of differentiation and integration, with these two basic operations being tied together by the fundamental theorem of calculus (and its higher dimensional generalisations, such as Stokes’ theorem). Similarly, the notion of the complex derivative and the complex line integral (that is to say, the contour integral) lie at the core of any introductory complex analysis course. Once again, they are tied to each other by the fundamental theorem of calculus; but in the complex case there is a further variant of the fundamental theorem, namely Cauchy’s theorem, which endows complex differentiable functions with many important and surprising properties that are often not shared by their real differentiable counterparts. We will give complex differentiable functions another name to emphasise this extra structure, by referring to such functions as holomorphic functions. (This term is also useful to distinguish these functions from the slightly less well-behaved meromorphic functions, which we will discuss in later notes.)
In this set of notes we will focus solely on the concept of complex differentiation, deferring the discussion of contour integration to the next set of notes. To begin with, the theory of complex differentiation will greatly resemble the theory of real differentiation; the definitions look almost identical, and well known laws of differential calculus such as the product rule, quotient rule, and chain rule carry over verbatim to the complex setting, and the theory of complex power series is similarly almost identical to the theory of real power series. However, when one compares the “one-dimensional” differentiation theory of the complex numbers with the “two-dimensional” differentiation theory of two real variables, we find that the dimensional discrepancy forces complex differentiable functions to obey a real-variable constraint, namely the Cauchy-Riemann equations. These equations make complex differentiable functions substantially more “rigid” than their real-variable counterparts; they imply for instance that the imaginary part of a complex differentiable function is essentially determined (up to constants) by the real part, and vice versa. Furthermore, even when considered separately, the real and imaginary components of complex differentiable functions are forced to obey the strong constraint of being harmonic. In later notes we will see these constraints manifest themselves in integral form, particularly through Cauchy’s theorem and the closely related Cauchy integral formula.
Despite all the constraints that holomorphic functions have to obey, a surprisingly large number of the functions of a complex variable that one actually encounters in applications turn out to be holomorphic. For instance, any polynomial with complex coefficients will be holomorphic, as will the complex exponential . From this and the laws of differential calculus one can then generate many further holomorphic functions. Also, as we will show presently, complex power series will automatically be holomorphic inside their disk of convergence. On the other hand, there are certainly basic complex functions of interest that are not holomorphic, such as the complex conjugation function , the absolute value function , or the real and imaginary part functions . We will also encounter functions that are only holomorphic at some portions of the complex plane, but not on others; for instance, rational functions will be holomorphic except at those few points where the denominator vanishes, and are prime examples of the meromorphic functions mentioned previously. Later on we will also consider functions such as branches of the logarithm or square root, which will be holomorphic outside of a branch cut corresponding to the choice of branch. It is a basic but important skill in complex analysis to be able to quickly recognise which functions are holomorphic and which ones are not, as many of useful theorems available to the former (such as Cauchy’s theorem) break down spectacularly for the latter. Indeed, in my experience, one of the most common “rookie errors” that beginning complex analysis students make is the error of attempting to apply a theorem about holomorphic functions to a function that is not at all holomorphic. This stands in contrast to the situation in real analysis, in which one can often obtain correct conclusions by formally applying the laws of differential or integral calculus to functions that might not actually be differentiable or integrable in a classical sense. (This latter phenomenon, by the way, can be largely explained using the theory of distributions, as covered for instance in this previous post, but this is beyond the scope of the current course.)
Remark 1 In this set of notes it will be convenient to impose some unnecessarily generous regularity hypotheses (e.g. continuous second differentiability) on the holomorphic functions one is studying in order to make the proofs simpler. In later notes, we will discover that these hypotheses are in fact redundant, due to the phenomenon of elliptic regularity that ensures that holomorphic functions are automatically smooth.
Kronecker is famously reported to have said, “God created the natural numbers; all else is the work of man”. The truth of this statement (literal or otherwise) is debatable; but one can certainly view the other standard number systems as (iterated) completions of the natural numbers in various senses. For instance:
- The integers are the additive completion of the natural numbers (the minimal additive group that contains a copy of ).
- The rationals are the multiplicative completion of the integers (the minimal field that contains a copy of ).
- The reals are the metric completion of the rationals (the minimal complete metric space that contains a copy of ).
- The complex numbers are the algebraic completion of the reals (the minimal algebraically closed field that contains a copy of ).
These descriptions of the standard number systems are elegant and conceptual, but not entirely suitable for constructing the number systems in a non-circular manner from more primitive foundations. For instance, one cannot quite define the reals from scratch as the metric completion of the rationals , because the definition of a metric space itself requires the notion of the reals! (One can of course construct by other means, for instance by using Dedekind cuts or by using uniform spaces in place of metric spaces.) The definition of the complex numbers as the algebraic completion of the reals does not suffer from such a non-circularity issue, but a certain amount of field theory is required to work with this definition initially. For the purposes of quickly constructing the complex numbers, it is thus more traditional to first define as a quadratic extension of the reals , and more precisely as the extension formed by adjoining a square root of to the reals, that is to say a solution to the equation . It is not immediately obvious that this extension is in fact algebraically closed; this is the content of the famous fundamental theorem of algebra, which we will prove later in this course.
The two equivalent definitions of – as the algebraic closure, and as a quadratic extension, of the reals respectively – each reveal important features of the complex numbers in applications. Because is algebraically closed, all polynomials over the complex numbers split completely, which leads to a good spectral theory for both finite-dimensional matrices and infinite-dimensional operators; in particular, one expects to be able to diagonalise most matrices and operators. Applying this theory to constant coefficient ordinary differential equations leads to a unified theory of such solutions, in which real-variable ODE behaviour such as exponential growth or decay, polynomial growth, and sinusoidal oscillation all become aspects of a single object, the complex exponential (or more generally, the matrix exponential ). Applying this theory more generally to diagonalise arbitrary translation-invariant operators over some locally compact abelian group, one arrives at Fourier analysis, which is thus most naturally phrased in terms of complex-valued functions rather than real-valued ones. If one drops the assumption that the underlying group is abelian, one instead discovers the representation theory of unitary representations, which is simpler to study than the real-valued counterpart of orthogonal representations. For closely related reasons, the theory of complex Lie groups is simpler than that of real Lie groups.
Meanwhile, the fact that the complex numbers are a quadratic extension of the reals lets one view the complex numbers geometrically as a two-dimensional plane over the reals (the Argand plane). Whereas a point singularity in the real line disconnects that line, a point singularity in the Argand plane leaves the rest of the plane connected (although, importantly, the punctured plane is no longer simply connected). As we shall see, this fact causes singularities in complex analytic functions to be better behaved than singularities of real analytic functions, ultimately leading to the powerful residue calculus for computing complex integrals. Remarkably, this calculus, when combined with the quintessentially complex-variable technique of contour shifting, can also be used to compute some (though certainly not all) definite integrals of real-valued functions that would be much more difficult to compute by purely real-variable methods; this is a prime example of Hadamard’s famous dictum that “the shortest path between two truths in the real domain passes through the complex domain”.
Another important geometric feature of the Argand plane is the angle between two tangent vectors to a point in the plane. As it turns out, the operation of multiplication by a complex scalar preserves the magnitude and orientation of such angles; the same fact is true for any non-degenerate complex analytic mapping, as can be seen by performing a Taylor expansion to first order. This fact ties the study of complex mappings closely to that of the conformal geometry of the plane (and more generally, of two-dimensional surfaces and domains). In particular, one can use complex analytic maps to conformally transform one two-dimensional domain to another, leading among other things to the famous Riemann mapping theorem, and to the classification of Riemann surfaces.
If one Taylor expands complex analytic maps to second order rather than first order, one discovers a further important property of these maps, namely that they are harmonic. This fact makes the class of complex analytic maps extremely rigid and well behaved analytically; indeed, the entire theory of elliptic PDE now comes into play, giving useful properties such as elliptic regularity and the maximum principle. In fact, due to the magic of residue calculus and contour shifting, we already obtain these properties for maps that are merely complex differentiable rather than complex analytic, which leads to the striking fact that complex differentiable functions are automatically analytic (in contrast to the real-variable case, in which real differentiable functions can be very far from being analytic).
The geometric structure of the complex numbers (and more generally of complex manifolds and complex varieties), when combined with the algebraic closure of the complex numbers, leads to the beautiful subject of complex algebraic geometry, which motivates the much more general theory developed in modern algebraic geometry. However, we will not develop the algebraic geometry aspects of complex analysis here.
Last, but not least, because of the good behaviour of Taylor series in the complex plane, complex analysis is an excellent setting in which to manipulate various generating functions, particularly Fourier series (which can be viewed as boundary values of power (or Laurent) series ), as well as Dirichlet series . The theory of contour integration provides a very useful dictionary between the asymptotic behaviour of the sequence , and the complex analytic behaviour of the Dirichlet or Fourier series, particularly with regard to its poles and other singularities. This turns out to be a particularly handy dictionary in analytic number theory, for instance relating the distribution of the primes to the Riemann zeta function. Nowadays, many of the analytic number theory results first obtained through complex analysis (such as the prime number theorem) can also be obtained by more “real-variable” methods; however the complex-analytic viewpoint is still extremely valuable and illuminating.
We will frequently touch upon many of these connections to other fields of mathematics in these lecture notes. However, these are mostly side remarks intended to provide context, and it is certainly possible to skip most of these tangents and focus purely on the complex analysis material in these notes if desired.
Note: complex analysis is a very visual subject, and one should draw plenty of pictures while learning it. I am however not planning to put too many pictures in these notes, partly as it is somewhat inconvenient to do so on this blog from a technical perspective, but also because pictures that one draws on one’s own are likely to be far more useful to you than pictures that were supplied by someone else.
Let be the divisor function. A classical application of the Dirichlet hyperbola method gives the asymptotic
where denotes the estimate as . Much better error estimates are possible here, but we will not focus on the lower order terms in this discussion. For somewhat idiosyncratic reasons I will interpret this estimate (and the other analytic number theory estimates discussed here) through the probabilistic lens. Namely, if is a random number selected uniformly between and , then the above estimate can be written as
that is to say the random variable has mean approximately . (But, somewhat paradoxically, this is not the median or mode behaviour of this random variable, which instead concentrates near , basically thanks to the Hardy-Ramanujan theorem.)
Now we turn to the pair correlations for a fixed positive integer . There is a classical computation of Ingham that shows that
The error term in (2) has been refined by many subsequent authors, as has the uniformity of the estimates in the aspect, as these topics are related to other questions in analytic number theory, such as fourth moment estimates for the Riemann zeta function; but we will not consider these more subtle features of the estimate here. However, we will look at the next term in the asymptotic expansion for (2) below the fold.
Using our probabilistic lens, the estimate (2) can be written as
From (1) (and the asymptotic negligibility of the shift by ) we see that the random variables and both have a mean of , so the additional factor of represents some arithmetic coupling between the two random variables.
Ingham’s formula can be established in a number of ways. Firstly, one can expand out and use the hyperbola method (splitting into the cases and and removing the overlap). If one does so, one soon arrives at the task of having to estimate sums of the form
for various . For much less than this can be achieved using a further application of the hyperbola method, but for comparable to things get a bit more complicated, necessitating the use of non-trivial estimates on Kloosterman sums in order to obtain satisfactory control on error terms. A more modern approach proceeds using automorphic form methods, as discussed in this previous post. A third approach, which unfortunately is only heuristic at the current level of technology, is to apply the Hardy-Littlewood circle method (discussed in this previous post) to express (2) in terms of exponential sums for various frequencies . The contribution of “major arc” can be computed after a moderately lengthy calculation which yields the right-hand side of (2) (as well as the correct lower order terms that are currently being suppressed), but there does not appear to be an easy way to show directly that the “minor arc” contributions are of lower order, although the methods discussed previously do indirectly show that this is ultimately the case.
Each of the methods outlined above requires a fair amount of calculation, and it is not obvious while performing them that the factor will emerge at the end. One can at least explain the as a normalisation constant needed to balance the factor (at a heuristic level, at least). To see this through our probabilistic lens, introduce an independent copy of , then
using symmetry to order (discarding the diagonal case ) and making the change of variables , we see that (4) is heuristically consistent with (3) as long as the asymptotic mean of in is equal to . (This argument is not rigorous because there was an implicit interchange of limits present, but still gives a good heuristic “sanity check” of Ingham’s formula.) Indeed, if denotes the asymptotic mean in , then we have (heuristically at least)
and we obtain the desired consistency after multiplying by .
This still however does not explain the presence of the factor. Intuitively it is reasonable that if has many prime factors, and has a lot of factors, then will have slightly more factors than average, because any common factor to and will automatically be acquired by . But how to quantify this effect?
One heuristic way to proceed is through analysis of local factors. Observe from the fundamental theorem of arithmetic that we can factor
where the product is over all primes , and is the local version of at (which in this case, is just one plus the –valuation of : ). Note that all but finitely many of the terms in this product will equal , so the infinite product is well-defined. In a similar fashion, we can factor
(or in terms of valuations, ). Heuristically, the Chinese remainder theorem suggests that the various factors behave like independent random variables, and so the correlation between and should approximately decouple into the product of correlations between the local factors and . And indeed we do have the following local version of Ingham’s asymptotics:
From the Euler formula
we see that
and so one can “explain” the arithmetic factor in Ingham’s asymptotic as the product of the arithmetic factors in the (much easier) local Ingham asymptotics. Unfortunately we have the usual “local-global” problem in that we do not know how to rigorously derive the global asymptotic from the local ones; this problem is essentially the same issue as the problem of controlling the minor arc contributions in the circle method, but phrased in “physical space” language rather than “frequency space”.
Remark 2 The relation between the local means and the global mean can also be seen heuristically through the application
Let us now prove this proposition. One could brute-force the computations by observing that for any fixed , the valuation is equal to with probability , and with a little more effort one can also compute the joint distribution of and , at which point the proposition reduces to the calculation of various variants of the geometric series. I however find it cleaner to proceed in a more recursive fashion (similar to how one can prove the geometric series formula by induction); this will also make visible the vague intuition mentioned previously about how common factors of and force to have a factor also.
It is first convenient to get rid of error terms by observing that in the limit , the random variable converges vaguely to a uniform random variable on the profinite integers , or more precisely that the pair converges vaguely to . Because of this (and because of the easily verified uniform integrability properties of and their powers), it suffices to establish the exact formulae
We begin with (5). Observe that is coprime to with probability , in which case is equal to . Conditioning to the complementary probability event that is divisible by , we can factor where is also uniformly distributed over the profinite integers, in which event we have . We arrive at the identity
As and have the same distribution, the quantities and are equal, and (5) follows by a brief amount of high-school algebra.
We use a similar method to treat (6). First treat the case when is coprime to . Then we see that with probability , and are simultaneously coprime to , in which case . Furthermore, with probability , is divisible by and is not; in which case we can write as before, with and . Finally, in the remaining event with probability , is divisible by and is not; we can then write , so that and . Putting all this together, we obtain
Now suppose that is divisible by , thus for some integer . Then with probability , and are simultaneously coprime to , in which case . In the remaining event, we can write , and then and . Putting all this together we have
which by (5) (and replacing by ) leads to the recursive relation
and (6) then follows by induction on the number of powers of .
for certain complicated but explicit coefficients . For instance, is given by the formula
where is the Euler-Mascheroni constant,
The formula for is similar but even more complicated. The error term was improved by Heath-Brown to ; it is conjectured (for instance by Conrey and Gonek) that one in fact has square root cancellation here, but this is well out of reach of current methods.
These lower order terms are traditionally computed either from a Dirichlet series approach (using Perron’s formula) or a circle method approach. It turns out that a refinement of the above heuristics can also predict these lower order terms, thus keeping the calculation purely in physical space as opposed to the “multiplicative frequency space” of the Dirichlet series approach, or the “additive frequency space” of the circle method, although the computations are arguably as messy as the latter computations for the purposes of working out the lower order terms. We illustrate this just for the term below the fold.
Fifteen years ago, I wrote a paper entitled Global regularity of wave maps. II. Small energy in two dimensions, in which I established global regularity of wave maps from two spatial dimensions to the unit sphere, assuming that the initial data had small energy. Recently, Hao Jia (personal communication) discovered a small gap in the argument that requires a slightly non-trivial fix. The issue does not really affect the subsequent literature, because the main result has since been reproven and extended by methods that avoid the gap (see in particular this subsequent paper of Tataru), but I have decided to describe the gap and its fix on this blog.
I will assume familiarity with the notation of my paper. In Section 10, some complicated spaces are constructed for each frequency scale , and then a further space is constructed for a given frequency envelope by the formula
where is the Littlewood-Paley projection of to frequency magnitudes . Then, given a spacetime slab , we define the restrictions
where the infimum is taken over all extensions of to the Minkowski spacetime ; similarly one defines
The gap in the paper is as follows: it was implicitly assumed that one could restrict (1) to the slab to obtain the equality
(This equality is implicitly used to establish the bound (36) in the paper.) Unfortunately, (1) only gives the lower bound, not the upper bound, and it is the upper bound which is needed here. The problem is that the extensions of that are optimal for computing are not necessarily the Littlewood-Paley projections of the extensions of that are optimal for computing .
To remedy the problem, one has to prove an upper bound of the form
the extension will then obey (5) (here we use Lemma 9 from my paper), but unfortunately is not guaranteed to obey (4) (the norm does control the norm, but a key point about frequency envelopes for the small energy regularity problem is that the coefficients , while bounded, are not necessarily summable).
This can be fixed as follows. For each we introduce a time cutoff supported on that equals on and obeys the usual derivative estimates in between (the time derivative of size for each ). Later we will prove the truncation estimate
Assuming this estimate, then if we set , then using Lemma 9 in my paper and (6), (7) (and the local stability of frequency envelopes) we have the required property (5). (There is a technical issue arising from the fact that is not necessarily Schwartz due to slow decay at temporal infinity, but by considering partial sums in the summation and taking limits we can check that is the strong limit of Schwartz functions, which suffices here; we omit the details for sake of exposition.) So the only issue is to establish (4), that is to say that
for all .
For this is immediate from (2). Now suppose that for some integer (the case when is treated similarly). Then we can split
The contribution of the term is acceptable by (6) and estimate (82) from my paper. The term sums to which is acceptable by (2). So it remains to control the norm of . By the triangle inequality and the fundamental theorem of calculus, we can bound
By hypothesis, . Using the first term in (79) of my paper and Bernstein’s inequality followed by (6) we have
and then we are done by summing the geometric series in .
It remains to prove the truncation estimate (7). This estimate is similar in spirit to the algebra estimates already in my paper, but unfortunately does not seem to follow immediately from these estimates as written, and so one has to repeat the somewhat lengthy decompositions and case checkings used to prove these estimates. We do this below the fold.
[This blog post was written jointly by Terry Tao and Will Sawin.]
In the previous blog post, one of us (Terry) implicitly introduced a notion of rank for tensors which is a little different from the usual notion of tensor rank, and which (following BCCGNSU) we will call “slice rank”. This notion of rank could then be used to encode the Croot-Lev-Pach-Ellenberg-Gijswijt argument that uses the polynomial method to control capsets.
Afterwards, several papers have applied the slice rank method to further problems – to control tri-colored sum-free sets in abelian groups (BCCGNSU, KSS) and from there to the triangle removal lemma in vector spaces over finite fields (FL), to control sunflowers (NS), and to bound progression-free sets in -groups (P).
In this post we investigate the notion of slice rank more systematically. In particular, we show how to give lower bounds for the slice rank. In many cases, we can show that the upper bounds on slice rank given in the aforementioned papers are sharp to within a subexponential factor. This still leaves open the possibility of getting a better bound for the original combinatorial problem using the slice rank of some other tensor, but for very long arithmetic progressions (at least eight terms), we show that the slice rank method cannot improve over the trivial bound using any tensor.
It will be convenient to work in a “basis independent” formalism, namely working in the category of abstract finite-dimensional vector spaces over a fixed field . (In the applications to the capset problem one takes to be the finite field of three elements, but most of the discussion here applies to arbitrary fields.) Given such vector spaces , we can form the tensor product , generated by the tensor products with for , subject to the constraint that the tensor product operation is multilinear. For each , we have the smaller tensor products , as well as the tensor product
defined in the obvious fashion. Elements of of the form for some and will be called rank one functions, and the slice rank (or rank for short) of an element of is defined to be the least nonnegative integer such that is a linear combination of rank one functions. If are finite-dimensional, then the rank is always well defined as a non-negative integer (in fact it cannot exceed . It is also clearly subadditive:
For , is when is zero, and otherwise. For , is the usual rank of the -tensor (which can for instance be identified with a linear map from to the dual space ). The usual notion of tensor rank for higher order tensors uses complete tensor products , as the rank one objects, rather than , giving a rank that is greater than or equal to the slice rank studied here.
From basic linear algebra we have the following equivalences:
- (i) One has .
- (ii) One has a representation of the form
where are finite sets of total cardinality at most , and for each and , and .
- (iii) One has
where for each , is a subspace of of total dimension at most , and we view as a subspace of in the obvious fashion.
- (iv) (Dual formulation) There exist subspaces of the dual space for , of total dimension at least , such that is orthogonal to , in the sense that one has the vanishing
for all , where is the obvious pairing.
Proof: The equivalence of (i) and (ii) is clear from definition. To get from (ii) to (iii) one simply takes to be the span of the , and conversely to get from (iii) to (ii) one takes the to be a basis of the and computes by using a basis for the tensor product consisting entirely of functions of the form for various . To pass from (iii) to (iv) one takes to be the annihilator of , and conversely to pass from (iv) to (iii).
One corollary of the formulation (iv), is that the set of tensors of slice rank at most is Zariski closed (if the field is algebraically closed), and so the slice rank itself is a lower semi-continuous function. This is in contrast to the usual tensor rank, which is not necessarily semicontinuous.
Corollary 2 Let be finite-dimensional vector spaces over an algebraically closed field . Let be a nonnegative integer. The set of elements of of slice rank at most is closed in the Zariski topology.
Proof: In view of Lemma 1(i and iv), this set is the union over tuples of integers with of the projection from of the set of tuples with orthogonal to , where is the Grassmanian parameterizing -dimensional subspaces of .
One can check directly that the set of tuples with orthogonal to is Zariski closed in using a set of equations of the form locally on . Hence because the Grassmanian is a complete variety, the projection of this set to is also Zariski closed. So the finite union over tuples of these projections is also Zariski closed.
We also have good behaviour with respect to linear transformations:
Furthermore, if the are all injective, then one has equality in (2).
Thus, for instance, the rank of a tensor is intrinsic in the sense that it is unaffected by any enlargements of the spaces .
Computing the rank of a tensor is difficult in general; however, the problem becomes a combinatorial one if one has a suitably sparse representation of that tensor in some basis, where we will measure sparsity by the property of being an antichain.
Now suppose that the coefficients are all non-zero, that each of the are equipped with a total ordering , and is the set of maximal elements of , thus there do not exist distinct , such that for all . Then one has
In particular, if is an antichain (i.e. every element is maximal), then equality holds in (4).
Proof: By Lemma 3 (or by enlarging the bases ), we may assume without loss of generality that each of the is spanned by the . By relabeling, we can also assume that each is of the form
with the usual ordering, and by Lemma 3 we may take each to be , with the standard basis.
Let denote the rank of . To show (4), it suffices to show the inequality
can (after collecting terms) be written as
holds for some covering . By Lemma 1(iv), there exist subspaces of whose dimension sums to
Let . Using Gaussian elimination, one can find a basis of whose representation in the standard dual basis of is in row-echelon form. That is to say, there exist natural numbers
such that for all , is a linear combination of the dual vectors , with the coefficient equal to one.
We now claim that is disjoint from . Suppose for contradiction that this were not the case, thus there exists for each such that
As is the set of maximal elements of , this implies that
for any tuple other than . On the other hand, we know that is a linear combination of , with the coefficient one. We conclude that the tensor product is equal to
plus a linear combination of other tensor products with not in . Taking inner products with (3), we conclude that , contradicting the fact that is orthogonal to . Thus we have disjoint from .
As an instance of this proposition, we recover the computation of diagonal rank from the previous blog post:
Example 5 Let be finite-dimensional vector spaces over a field for some . Let be a natural number, and for , let be a linearly independent set in . Let be non-zero coefficients in . Then
has rank . Indeed, one applies the proposition with all equal to , with the diagonal in ; this is an antichain if we give one of the the standard ordering, and another of the the opposite ordering (and ordering the remaining arbitrarily). In this case, the are all bijective, and so it is clear that the minimum in (4) is simply .
The combinatorial minimisation problem in the above proposition can be solved asymptotically when working with tensor powers, using the notion of the Shannon entropy of a discrete random variable .
Let be a tensor of the form (3) for some coefficients . For each natural number , let be the tensor power of copies of , viewed as an element of . Then
Now suppose that the coefficients are all non-zero and that each of the are equipped with a total ordering . Let be the set of maximal elements of in the product ordering, and let where range over random variables taking values in . Then
as , where is the projection map. Then the same thing will apply to and . Then applying Proposition 4, using the lexicographical ordering on and noting that, if are the maximal elements of , then are the maximal elements of , we obtain both (9) and (11).
Let be a small positive quantity that goes to zero sufficiently slowly with . Let denote the set of all tuples in that are within of being distributed according to the law of , in the sense that for all , one has
By the asymptotic equipartition property, the cardinality of can be computed to be
which by (13) implies that
noting that the factor can be absorbed into the error). This gives the lower bound in (12).
Now we prove the upper bound. We can cover by sets of the form for various choices of random variables taking values in . For each such random variable , we can find such that ; we then place all of in . It is then clear that the cover and that
for all , giving the required upper bound.
It is of interest to compute the quantity in (10). We have the following criterion for when a maximiser occurs:
Proposition 7 Let be finite sets, and be non-empty. Let be the quantity in (10). Let be a random variable taking values in , and let denote the essential range of , that is to say the set of tuples such that is non-zero. Then the following are equivalent:
- (i) attains the maximum in (10).
- (ii) There exist weights and a finite quantity , such that whenever , and such that
for all , with equality if . (In particular, must vanish if there exists a with .)
Proof: We first show that (i) implies (ii). The function is concave on . As a consequence, if we define to be the set of tuples such that there exists a random variable taking values in with , then is convex. On the other hand, by (10), is disjoint from the orthant . Thus, by the hyperplane separation theorem, we conclude that there exists a half-space
where are reals that are not all zero, and is another real, which contains on its boundary and in its interior, such that avoids the interior of the half-space. Since is also on the boundary of , we see that the are non-negative, and that whenever .
By construction, the quantity
is maximised when . At this point we could use the method of Lagrange multipliers to obtain the required constraints, but because we have some boundary conditions on the (namely, that the probability that they attain a given element of has to be non-negative) we will work things out by hand. Let be an element of , and an element of . For small enough, we can form a random variable taking values in , whose probability distribution is the same as that for except that the probability of attaining is increased by , and the probability of attaining is decreased by . If there is any for which and , then one can check that
for sufficiently small , contradicting the maximality of ; thus we have whenever . Taylor expansion then gives
for small , where
and similarly for . We conclude that for all and , thus there exists a quantity such that for all , and for all . By construction must be nonnegative. Sampling using the distribution of , one has
almost surely; taking expectations we conclude that
The inner sum is , which equals when is non-zero, giving (17).
for any (note the right-hand side may be infinite when and ). Let be any random variable taking values in , then on applying the above inequality with and , multiplying by , and summing over and gives
By construction, one has
so to prove that (which would give (i)), it suffices to show that
or equivalently that the quantity
is maximised when . Since
it suffices to show this claim for the quantity
One can view this quantity as
By (ii), this quantity is bounded by , with equality if is equal to (and is in particular ranging in ), giving the claim.
The second half of the proof of Proposition 7 only uses the marginal distributions and the equation(16), not the actual distribution of , so it can also be used to prove an upper bound on when the exact maximizing distribution is not known, given suitable probability distributions in each variable. The logarithm of the probability distribution here plays the role that the weight functions do in BCCGNSU.
Remark 8 Suppose one is in the situation of (i) and (ii) above; assume the nondegeneracy condition that is positive (or equivalently that is positive). We can assign a “degree” to each element by the formula
then every tuple in has total degree at most , and those tuples in have degree exactly . In particular, every tuple in has degree at most , and hence by (17), each such tuple has a -component of degree less than or equal to for some with . On the other hand, we can compute from (19) and the fact that for that . Thus, by asymptotic equipartition, and assuming , the number of “monomials” in of total degree at most is at most ; one can in fact use (19) and (18) to show that this is in fact an equality. This gives a direct way to cover by sets with , which is in the spirit of the Croot-Lev-Pach-Ellenberg-Gijswijt arguments from the previous post.
We can now show that the rank computation for the capset problem is sharp:
Proof: In , we have
Thus, if we let be the space of functions from to (with domain variable denoted respectively), and define the basis functions
of indexed by (with the usual ordering), respectively, and set to be the set
then is a linear combination of the with , and all coefficients non-zero. Then we have . We will show that the quantity of (10) agrees with the quantity of (20), and that the optimizing distribution is supported on , so that by Proposition 6 the rank of is .
To compute the quantity at (10), we use the criterion in Proposition 7. We take to be the random variable taking values in that attains each of the values with a probability of , and each of with a probability of ; then each of the attains the values of with probabilities respectively, so in particular is equal to the quantity in (20). If we now set and
This statement already follows from the result of Kleinberg-Sawin-Speyer, which gives a “tri-colored sum-free set” in of size , as the slice rank of this tensor is an upper bound for the size of a tri-colored sum-free set. If one were to go over the proofs more carefully to evaluate the subexponential factors, this argument would give a stronger lower bound than KSS, as it does not deal with the substantial loss that comes from Behrend’s construction. However, because it actually constructs a set, the KSS result rules out more possible approaches to give an exponential improvement of the upper bound for capsets. The lower bound on slice rank shows that the bound cannot be improved using only the slice rank of this particular tensor, whereas KSS shows that the bound cannot be improved using any method that does not take advantage of the “single-colored” nature of the problem.
We can also show that the slice rank upper bound in a result of Naslund-Sawin is similarly sharp:
Proposition 10 Let denote the space of functions from to . Then the function from , viewed as an element of , has slice rank
Proof: Let and be a basis for the space of functions on , itself indexed by . Choose similar bases for and , with and .
Set . Then is a linear combination of the with , and all coefficients non-zero. Order the usual way so that is an antichain. We will show that the quantity of (10) is , so that applying the last statement of Proposition 6, we conclude that the rank of is ,
Let be the random variable taking values in that attains each of the values with a probability of . Then each of the attains the value with probability and with probability , so
We used a slightly different method in each of the last two results. In the first one, we use the most natural bases for all three vector spaces, and distinguish from its set of maximal elements . In the second one we modify one basis element slightly, with instead of the more obvious choice , which allows us to work with instead of . Because is an antichain, we do not need to distinguish and . Both methods in fact work with either problem, and they are both about equally difficult, but we include both as either might turn out to be substantially more convenient in future work.
Proposition 11 Let be a natural number and let be a finite abelian group. Let be any field. Let denote the space of functions from to .
Let be any -valued function on that is nonzero only when the elements of form a -term arithmetic progression, and is nonzero on every -term constant progression.
Then the slice rank of is .
Proof: We apply Proposition 4, using the standard bases of . Let be the support of . Suppose that we have orderings on such that the constant progressions are maximal elements of and thus all constant progressions lie in . Then for any partition of , can contain at most constant progressions, and as all constant progressions must lie in one of the , we must have . By Proposition 4, this implies that the slice rank of is at least . Since is a tensor, the slice rank is at most , hence exactly .
So it is sufficient to find orderings on such that the constant progressions are maximal element of . We make several simplifying reductions: We may as well assume that consists of all the -term arithmetic progressions, because if the constant progressions are maximal among the set of all progressions then they are maximal among its subset . So we are looking for an ordering in which the constant progressions are maximal among all -term arithmetic progressions. We may as well assume that is cyclic, because if for each cyclic group we have an ordering where constant progressions are maximal, on an arbitrary finite abelian group the lexicographic product of these orderings is an ordering for which the constant progressions are maximal. We may assume , as if we have an -tuple of orderings where constant progressions are maximal, we may add arbitrary orderings and the constant progressions will remain maximal.
So it is sufficient to find orderings on the cyclic group such that the constant progressions are maximal elements of the set of -term progressions in in the -fold product ordering. To do that, let the first, second, third, and fifth orderings be the usual order on and let the fourth, sixth, seventh, and eighth orderings be the reverse of the usual order on .
Then let be a constant progression and for contradiction assume that is a progression greater than in this ordering. We may assume that , because otherwise we may reverse the order of the progression, which has the effect of reversing all eight orderings, and then apply the transformation , which again reverses the eight orderings, bringing us back to the original problem but with .
Take a representative of the residue class in the interval . We will abuse notation and call this . Observe that , and are all contained in the interval modulo . Take a representative of the residue class in the interval . Then is in the interval for some . The distance between any distinct pair of intervals of this type is greater than , but the distance between and is at most , so is in the interval . By the same reasoning, is in the interval . Therefore . But then the distance between and is at most , so by the same reasoning is in the interval . Because is between and , it also lies in the interval . Because is in the interval , and by assumption it is congruent mod to a number in the set greater than or equal to , it must be exactly . Then, remembering that and lie in , we have and , so , hence , thus , which contradicts the assumption that .
In fact, given a -term progressions mod and a constant, we can form a -term binary sequence with a for each step of the progression that is greater than the constant and a for each step that is less. Because a rotation map, viewed as a dynamical system, has zero topological entropy, the number of -term binary sequences that appear grows subexponentially in . Hence there must be, for large enough , at least one sequence that does not appear. In this proof we exploit a sequence that does not appear for .