You are currently browsing the category archive for the ‘expository’ category.
Let be a finite field of order , and let be an absolutely irreducible smooth projective curve defined over (and hence over the algebraic closure of that field). For instance, could be the projective elliptic curve
in the projective plane , where are coefficients whose discriminant is non-vanishing, which is the projective version of the affine elliptic curve
To each such curve one can associate a genus , which we will define later; for instance, elliptic curves have genus . We can also count the cardinality of the set of -points of . The Hasse-Weil bound relates the two:
The usual proofs of this bound proceed by first establishing a trace formula of the form
for some complex numbers independent of ; this is in fact a special case of the Lefschetz-Grothendieck trace formula, and can be interpreted as an assertion that the zeta function associated to the curve is rational. The task is then to establish a bound for all ; this (or more precisely, the slightly stronger assertion ) is the Riemann hypothesis for such curves. This can be done either by passing to the Jacobian variety of and using a certain duality available on the cohomology of such varieties, known as Rosati involution; alternatively, one can pass to the product surface and apply the Riemann-Roch theorem for that surface.
In 1969, Stepanov introduced an elementary method (a version of what is now known as the polynomial method) to count (or at least to upper bound) the quantity . The method was initially restricted to hyperelliptic curves, but was soon extended to general curves. In particular, Bombieri used this method to give a short proof of the following weaker version of the Hasse-Weil bound:
Theorem 2 (Weak Hasse-Weil bound) If is a perfect square, and , then .
In fact, the bound on can be sharpened a little bit further, as we will soon see.
Theorem 2 is only an upper bound on , but there is a Galois-theoretic trick to convert (a slight generalisation of) this upper bound to a matching lower bound, and if one then uses the trace formula (1) (and the “tensor power trick” of sending to infinity to control the weights ) one can then recover the full Hasse-Weil bound. We discuss these steps below the fold.
I’ve discussed Bombieri’s proof of Theorem 2 in this previous post (in the special case of hyperelliptic curves), but now wish to present the full proof, with some minor simplifications from Bombieri’s original presentation; it is mostly elementary, with the deepest fact from algebraic geometry needed being Riemann’s inequality (a weak form of the Riemann-Roch theorem).
The first step is to reinterpret as the number of points of intersection between two curves in the surface . Indeed, if we define the Frobenius endomorphism on any projective space by
then this map preserves the curve , and the fixed points of this map are precisely the points of :
Thus one can interpret as the number of points of intersection between the diagonal curve
and the Frobenius graph
which are copies of inside . But we can use the additional hypothesis that is a perfect square to write this more symmetrically, by taking advantage of the fact that the Frobenius map has a square root
with also preserving . One can then also interpret as the number of points of intersection between the curve
Let be the field of rational functions on (with coefficients in ), and define , , and analogously )(although is likely to be disconnected, so will just be a ring rather than a field. We then (morally) have the commuting square
if we ignore the issue that a rational function on, say, , might blow up on all of and thus not have a well-defined restriction to . We use and to denote the restriction maps. Furthermore, we have obvious isomorphisms , coming from composing with the graphing maps and .
The idea now is to find a rational function on the surface of controlled degree which vanishes when restricted to , but is non-vanishing (and not blowing up) when restricted to . On , we thus get a non-zero rational function of controlled degree which vanishes on – which then lets us bound the cardinality of in terms of the degree of . (In Bombieri’s original argument, one required vanishing to high order on the side, but in our presentation, we have factored out a term which removes this high order vanishing condition.)
To find this , we will use linear algebra. Namely, we will locate a finite-dimensional subspace of (consisting of certain “controlled degree” rational functions) which projects injectively to , but whose projection to has strictly smaller dimension than itself. The rank-nullity theorem then forces the existence of a non-zero element of whose projection to vanishes, but whose projection to is non-zero.
Now we build . Pick a point of , which we will think of as being a point at infinity. (For the purposes of proving Theorem 2, we may clearly assume that is non-empty.) Thus is fixed by . To simplify the exposition, we will also assume that is fixed by the square root of ; in the opposite case when has order two when acting on , the argument is essentially the same, but all references to in the second factor of need to be replaced by (we leave the details to the interested reader).
For any natural number , define to be the set of rational functions which are allowed to have a pole of order up to at , but have no other poles on ; note that as we are assuming to be smooth, it is unambiguous what a pole is (and what order it will have). (In the fancier language of divisors and Cech cohomology, we have .) The space is clearly a vector space over ; one can view intuitively as the space of “polynomials” on of “degree” at most . When , consists just of the constant functions. Indeed, if , then the image of avoids and so lies in the affine line ; but as is projective, the image needs to be compact (hence closed) in , and must therefore be a point, giving the claim.
For higher , we have the easy relations
The former inequality just comes from the trivial inclusion . For the latter, observe that if two functions lie in , so that they each have a pole of order at most at , then some linear combination of these functions must have a pole of order at most at ; thus has codimension at most one in , giving the claim.
From (3) and induction we see that each of the are finite dimensional, with the trivial upper bound
Riemann’s inequality complements this with the lower bound
thus one has for all but at most exceptions (in fact, exactly exceptions as it turns out). This is a consequence of the Riemann-Roch theorem; it can be proven from abstract nonsense (the snake lemma) if one defines the genus in a non-standard fashion (as the dimension of the first Cech cohomology of the structure sheaf of ), but to obtain this inequality with a standard definition of (e.g. as the dimension of the zeroth Cech cohomolgy of the line bundle of differentials) requires the more non-trivial tool of Serre duality.
At any rate, now that we have these vector spaces , we will define to be a tensor product space
for some natural numbers which we will optimise in later. That is to say, is spanned by functions of the form with and . This is clearly a linear subspace of of dimension , and hence by Rieman’s inequality we have
Observe that maps a tensor product to a function . If and , then we see that the function has a pole of order at most at . We conclude that
and in particular by (4)
We will choose to be a bit bigger than , to make the image of smaller than that of . From (6), (10) we see that if we have the inequality
(together with (7)) then cannot be injective.
On the other hand, we have the following basic fact:
Proof: From (3), we can find a linear basis of such that each of the has a distinct order of pole at (somewhere between and inclusive). Similarly, we may find a linear basis of such that each of the has a distinct order of pole at (somewhere between and inclusive). The functions then span , and the order of pole at is . But since , these orders are all distinct, and so these functions must be linearly independent. The claim follows.
This gives us the following bound:
Proposition 4 Let be natural numbers such that (7), (11), (12) hold. Then .
Proof: As is not injective, we can find with vanishing. By the above lemma, the function is then non-zero, but it must also vanish on , which has cardinality . On the other hand, by (8), has a pole of order at most at and no other poles. Since the number of poles and zeroes of a rational function on a projective curve must add up to zero, the claim follows.
If , we may make the explicit choice
and a brief calculation then gives Theorem 2. In some cases one can optimise things a bit further. For instance, in the genus zero case (e.g. if is just the projective line ) one may take and conclude the absolutely sharp bound in this case; in the case of the projective line , the function is in fact the very concrete function .
Remark 1 When is not a perfect square, one can try to run the above argument using the factorisation instead of . This gives a weaker version of the above bound, of the shape . In the hyperelliptic case at least, one can erase this loss by working with a variant of the argument in which one requires to vanish to high order at , rather than just to first order; see this survey article of mine for details.
Roth’s theorem on arithmetic progressions asserts that every subset of the integers of positive upper density contains infinitely many arithmetic progressions of length three. There are many versions and variants of this theorem. Here is one of them:
Theorem 1 (Roth’s theorem) Let be a compact abelian group, with Haar probability measure , which is -divisible (i.e. the map is surjective) and let be a measurable subset of with for some . Then we have
where denotes the bound for some depending only on .
This theorem is usually formulated in the case that is a finite abelian group of odd order (in which case the result is essentially due to Meshulam) or more specifically a cyclic group of odd order (in which case it is essentially due to Varnavides), but is also valid for the more general setting of -divisible compact abelian groups, as we shall shortly see. One can be more precise about the dependence of the implied constant on , but to keep the exposition simple we will work at the qualitative level here, without trying at all to get good quantitative bounds. The theorem is also true without the -divisibility hypothesis, but the proof we will discuss runs into some technical issues due to the degeneracy of the shift in that case.
We can deduce Theorem 1 from the following more general Khintchine-type statement. Let denote the Pontryagin dual of a compact abelian group , that is to say the set of all continuous homomorphisms from to the (additive) unit circle . Thus is a discrete abelian group, and functions have a Fourier transform defined by
If is -divisible, then is -torsion-free in the sense that the map is injective. For any finite set and any radius , define the Bohr set
where denotes the distance of to the nearest integer. We refer to the cardinality of as the rank of the Bohr set. We record a simple volume bound on Bohr sets:
Lemma 2 (Volume packing bound) Let be a compact abelian group with Haar probability measure . For any Bohr set , we have
Proof: We can cover the torus by translates of the cube . Then the sets form an cover of . But all of these sets lie in a translate of , and the claim then follows from the translation invariance of .
Given any Bohr set , we define a normalised “Lipschitz” cutoff function by the formula
where is the constant such that
thus
The function should be viewed as an -normalised “tent function” cutoff to . Note from Lemma 2 that
We then have the following sharper version of Theorem 1:
Theorem 3 (Roth-Khintchine theorem) Let be a -divisible compact abelian group, with Haar probability measure , and let . Then for any measurable function , there exists a Bohr set with and such that
where denotes the convolution operation
A variant of this result (expressed in the language of ergodic theory) appears in this paper of Bergelson, Host, and Kra; a combinatorial version of the Bergelson-Host-Kra result that is closer to Theorem 3 subsequently appeared in this paper of Ben Green and myself, but this theorem arguably appears implicitly in a much older paper of Bourgain. To see why Theorem 3 implies Theorem 1, we apply the theorem with and equal to a small multiple of to conclude that there is a Bohr set with and such that
But from (2) we have the pointwise bound , and Theorem 1 follows.
Below the fold, we give a short proof of Theorem 3, using an “energy pigeonholing” argument that essentially dates back to the 1986 paper of Bourgain mentioned previously (not to be confused with a later 1999 paper of Bourgain on Roth’s theorem that was highly influential, for instance in emphasising the importance of Bohr sets). The idea is to use the pigeonhole principle to choose the Bohr set to capture all the “large Fourier coefficients” of , but such that a certain “dilate” of does not capture much more Fourier energy of than itself. The bound (3) may then be obtained through elementary Fourier analysis, without much need to explicitly compute things like the Fourier transform of an indicator function of a Bohr set. (However, the bound obtained by this argument is going to be quite poor – of tower-exponential type.) To do this we perform a structural decomposition of into “structured”, “small”, and “highly pseudorandom” components, as is common in the subject (e.g. in this previous blog post), but even though we crucially need to retain non-negativity of one of the components in this decomposition, we can avoid recourse to conditional expectation with respect to a partition (or “factor”) of the space, using instead convolution with one of the considered above to achieve a similar effect.
Let be an irreducible polynomial in three variables. As is not algebraically closed, the zero set can split into various components of dimension between and . For instance, if , the zero set is a line; more interestingly, if , then is the union of a line and a surface (or the product of an acnodal cubic curve with a line). We will assume that the -dimensional component is non-empty, thus defining a real surface in . In particular, this hypothesis implies that is not just irreducible over , but is in fact absolutely irreducible (i.e. irreducible over ), since otherwise one could use the complex factorisation of to contain inside the intersection of the complex zero locus of complex polynomial and its complex conjugate, with having no common factor, forcing to be at most one-dimensional. (For instance, in the case , one can take .) Among other things, this makes a Zariski-dense subset of , thus any polynomial identity which holds true at every point of , also holds true on all of . This allows us to easily use tools from algebraic geometry in this real setting, even though the reals are not quite algebraically closed.
The surface is said to be ruled if, for a Zariski open dense set of points , there exists a line through for some non-zero which is completely contained in , thus
for all . Also, a point is said to be a flecnode if there exists a line through for some non-zero which is tangent to to third order, in the sense that
for . Clearly, if is a ruled surface, then a Zariski open dense set of points on are a flecnode. We then have the remarkable theorem of Cayley and Salmon asserting the converse:
Theorem 1 (Cayley-Salmon theorem) Let be an irreducible polynomial with non-empty. Suppose that a Zariski dense set of points in are flecnodes. Then is a ruled surface.
Among other things, this theorem was used in the celebrated result of Guth and Katz that almost solved the Erdos distance problem in two dimensions, as discussed in this previous blog post. Vanishing to third order is necessary: observe that in a surface of negative curvature, such as the saddle , every point on the surface is tangent to second order to a line (the line in the direction for which the second fundamental form vanishes).
The original proof of the Cayley-Salmon theorem, dating back to at least 1915, is not easily accessible and not written in modern language. A modern proof of this theorem (together with substantial generalisations, for instance to higher dimensions) is given by Landsberg; the proof uses the machinery of modern algebraic geometry. The purpose of this post is to record an alternate proof of the Cayley-Salmon theorem based on classical differential geometry (in particular, the notion of torsion of a curve) and basic ODE methods (in particular, Gronwall’s inequality and the Picard existence theorem). The idea is to “integrate” the lines indicated by the flecnode to produce smooth curves on the surface ; one then uses the vanishing (1) and some basic calculus to conclude that these curves have zero torsion and are thus planar curves. Some further manipulation using (1) (now just to second order instead of third) then shows that these curves are in fact straight lines, giving the ruling on the surface.
Update: Janos Kollar has informed me that the above theorem was essentially known to Monge in 1809; see his recent arXiv note for more details.
I thank Larry Guth and Micha Sharir for conversations leading to this post.
A core foundation of the subject now known as arithmetic combinatorics (and particularly the subfield of additive combinatorics) are the elementary sum set estimates (sometimes known as “Ruzsa calculus”) that relate the cardinality of various sum sets
and difference sets
as well as iterated sumsets such as , , and so forth. Here, are finite non-empty subsets of some additive group (classically one took or , but nowadays one usually considers more general additive groups). Some basic estimates in this vein are the following:
Lemma 1 (Ruzsa covering lemma) Let be finite non-empty subsets of . Then may be covered by at most translates of .
Proof: Consider a maximal set of disjoint translates of by elements . These translates have cardinality , are disjoint, and lie in , so there are at most of them. By maximality, for any , must intersect at least one of the selected , thus , and the claim follows.
Lemma 2 (Ruzsa triangle inequality) Let be finite non-empty subsets of . Then .
Proof: Consider the addition map from to . Every element of has a preimage of this map of cardinality at least , thanks to the obvious identity for each . Since has cardinality , the claim follows.
Such estimates (which are covered, incidentally, in Section 2 of my book with Van Vu) are particularly useful for controlling finite sets of small doubling, in the sense that for some bounded . (There are deeper theorems, most notably Freiman’s theorem, which give more control than what elementary Ruzsa calculus does, however the known bounds in the latter theorem are worse than polynomial in (although it is conjectured otherwise), whereas the elementary estimates are almost all polynomial in .)
However, there are some settings in which the standard sum set estimates are not quite applicable. One such setting is the continuous setting, where one is dealing with bounded open sets in an additive Lie group (e.g. or a torus ) rather than a finite setting. Here, one can largely replicate the discrete sum set estimates by working with a Haar measure in place of cardinality; this is the approach taken for instance in this paper of mine. However, there is another setting, which one might dub the “discretised” setting (as opposed to the “discrete” setting or “continuous” setting), in which the sets remain finite (or at least discretisable to be finite), but for which there is a certain amount of “roundoff error” coming from the discretisation. As a typical example (working now in a non-commutative multiplicative setting rather than an additive one), consider the orthogonal group of orthogonal matrices, and let be the matrices obtained by starting with all of the orthogonal matrice in and rounding each coefficient of each matrix in this set to the nearest multiple of , for some small . This forms a finite set (whose cardinality grows as like a certain negative power of ). In the limit , the set is not a set of small doubling in the discrete sense. However, is still close to in a metric sense, being contained in the -neighbourhood of . Another key example comes from graphs of maps from a subset of one additive group to another . If is “approximately additive” in the sense that for all , is close to in some metric, then might not have small doubling in the discrete sense (because could take a large number of values), but could be considered a set of small doubling in a discretised sense.
One would like to have a sum set (or product set) theory that can handle these cases, particularly in “high-dimensional” settings in which the standard methods of passing back and forth between continuous, discrete, or discretised settings behave poorly from a quantitative point of view due to the exponentially large doubling constant of balls. One way to do this is to impose a translation invariant metric on the underlying group (reverting back to additive notation), and replace the notion of cardinality by that of metric entropy. There are a number of almost equivalent ways to define this concept:
Definition 3 Let be a metric space, let be a subset of , and let be a radius.
- The packing number is the largest number of points one can pack inside such that the balls are disjoint.
- The internal covering number is the fewest number of points such that the balls cover .
- The external covering number is the fewest number of points such that the balls cover .
- The metric entropy is the largest number of points one can find in that are -separated, thus for all .
It is an easy exercise to verify the inequalities
for any , and that is non-increasing in and non-decreasing in for the three choices (but monotonicity in can fail for !). It turns out that the external covering number is slightly more convenient than the other notions of metric entropy, so we will abbreviate . The cardinality can be viewed as the limit of the entropies as .
If we have the bounded doubling property that is covered by translates of for each , and one has a Haar measure on which assigns a positive finite mass to each ball, then any of the above entropies is comparable to , as can be seen by simple volume packing arguments. Thus in the bounded doubling setting one can usually use the measure-theoretic sum set theory to derive entropy-theoretic sumset bounds (see e.g. this paper of mine for an example of this). However, it turns out that even in the absence of bounded doubling, one still has an entropy analogue of most of the elementary sum set theory, except that one has to accept some degradation in the radius parameter by some absolute constant. Such losses can be acceptable in applications in which the underlying sets are largely “transverse” to the balls , so that the -entropy of is largely independent of ; this is a situation which arises in particular in the case of graphs discussed above, if one works with “vertical” metrics whose balls extend primarily in the vertical direction. (I hope to present a specific application of this type here in the near future.)
Henceforth we work in an additive group equipped with a translation-invariant metric . (One can also generalise things slightly by allowing the metric to attain the values or , without changing much of the analysis below.) By the Heine-Borel theorem, any precompact set will have finite entropy for any . We now have analogues of the two basic Ruzsa lemmas above:
Lemma 4 (Ruzsa covering lemma) Let be precompact non-empty subsets of , and let . Then may be covered by at most translates of .
Proof: Let be a maximal set of points such that the sets are all disjoint. Then the sets are disjoint in and have entropy , and furthermore any ball of radius can intersect at most one of the . We conclude that , so . If , then must intersect one of the , so , and the claim follows.
Lemma 5 (Ruzsa triangle inequality) Let be precompact non-empty subsets of , and let . Then .
Proof: Consider the addition map from to . The domain may be covered by product balls . Every element of has a preimage of this map which projects to a translate of , and thus must meet at least of these product balls. However, if two elements of are separated by a distance of at least , then no product ball can intersect both preimages. We thus see that , and the claim follows.
Below the fold we will record some further metric entropy analogues of sum set estimates (basically redoing much of Chapter 2 of my book with Van Vu). Unfortunately there does not seem to be a direct way to abstractly deduce metric entropy results from their sum set analogues (basically due to the failure of a certain strong version of Freiman’s theorem, as discussed in this previous post); nevertheless, the proofs of the discrete arguments are elementary enough that they can be modified with a small amount of effort to handle the entropy case. (In fact, there should be a very general model-theoretic framework in which both the discrete and entropy arguments can be processed in a unified manner; see this paper of Hrushovski for one such framework.)
It is also likely that many of the arguments here extend to the non-commutative setting, but for simplicity we will not pursue such generalisations here.
As in the previous post, all computations here are at the formal level only.
In the previous blog post, the Euler equations for inviscid incompressible fluid flow were interpreted in a Lagrangian fashion, and then Noether’s theorem invoked to derive the known conservation laws for these equations. In a bit more detail: starting with Lagrangian space and Eulerian space , we let be the space of volume-preserving, orientation-preserving maps from Lagrangian space to Eulerian space. Given a curve , we can define the Lagrangian velocity field as the time derivative of , and the Eulerian velocity field . The volume-preserving nature of ensures that is a divergence-free vector field:
If we formally define the functional
then one can show that the critical points of this functional (with appropriate boundary conditions) obey the Euler equations
for some pressure field . As discussed in the previous post, the time translation symmetry of this functional yields conservation of the Hamiltonian
the rigid motion symmetries of Eulerian space give conservation of the total momentum
and total angular momentum
and the diffeomorphism symmetries of Lagrangian space give conservation of circulation
for any closed loop in , or equivalently pointwise conservation of the Lagrangian vorticity , where is the -form associated with the vector field using the Euclidean metric on , with denoting pullback by .
It turns out that one can generalise the above calculations. Given any self-adjoint operator on divergence-free vector fields , we can define the functional
as we shall see below the fold, critical points of this functional (with appropriate boundary conditions) obey the generalised Euler equations
for some pressure field , where in coordinates is with the usual summation conventions. (When , , and this term can be absorbed into the pressure , and we recover the usual Euler equations.) Time translation symmetry then gives conservation of the Hamiltonian
If the operator commutes with rigid motions on , then we have conservation of total momentum
and total angular momentum
and the diffeomorphism symmetries of Lagrangian space give conservation of circulation
or pointwise conservation of the Lagrangian vorticity . These applications of Noether’s theorem proceed exactly as the previous post; we leave the details to the interested reader.
One particular special case of interest arises in two dimensions , when is the inverse derivative . The vorticity is a -form, which in the two-dimensional setting may be identified with a scalar. In coordinates, if we write , then
Since is also divergence-free, we may therefore write
where the stream function is given by the formula
If we take the curl of the generalised Euler equation (2), we obtain (after some computation) the surface quasi-geostrophic equation
This equation has strong analogies with the three-dimensional incompressible Euler equations, and can be viewed as a simplified model for that system; see this paper of Constantin, Majda, and Tabak for details.
Now we can specialise the general conservation laws derived previously to this setting. The conserved Hamiltonian is
(a law previously observed for this equation in the abovementioned paper of Constantin, Majda, and Tabak). As commutes with rigid motions, we also have (formally, at least) conservation of momentum
(which up to trivial transformations is also expressible in impulse form as , after integration by parts), and conservation of angular momentum
(which up to trivial transformations is ). Finally, diffeomorphism invariance gives pointwise conservation of Lagrangian vorticity , thus is transported by the flow (which is also evident from (3). In particular, all integrals of the form for a fixed function are conserved by the flow.
Throughout this post, we will work only at the formal level of analysis, ignoring issues of convergence of integrals, justifying differentiation under the integral sign, and so forth. (Rigorous justification of the conservation laws and other identities arising from the formal manipulations below can usually be established in an a posteriori fashion once the identities are in hand, without the need to rigorously justify the manipulations used to come up with these identities).
It is a remarkable fact in the theory of differential equations that many of the ordinary and partial differential equations that are of interest (particularly in geometric PDE, or PDE arising from mathematical physics) admit a variational formulation; thus, a collection of one or more fields on a domain taking values in a space will solve the differential equation of interest if and only if is a critical point to the functional
involving the fields and their first derivatives , where the Lagrangian is a function on the vector bundle over consisting of triples with , , and a linear transformation; we also usually keep the boundary data of fixed in case has a non-trivial boundary, although we will ignore these issues here. (We also ignore the possibility of having additional constraints imposed on and , which require the machinery of Lagrange multipliers to deal with, but which will only serve as a distraction for the current discussion.) It is common to use local coordinates to parameterise as and as , in which case can be viewed locally as a function on .
Example 1 (Geodesic flow) Take and to be a Riemannian manifold, which we will write locally in coordinates as with metric for . A geodesic is then a critical point (keeping fixed) of the energy functional
or in coordinates (ignoring coordinate patch issues, and using the usual summation conventions)
As discussed in this previous post, both the Euler equations for rigid body motion, and the Euler equations for incompressible inviscid flow, can be interpreted as geodesic flow (though in the latter case, one has to work really formally, as the manifold is now infinite dimensional).
More generally, if is itself a Riemannian manifold, which we write locally in coordinates as with metric for , then a harmonic map is a critical point of the energy functional
or in coordinates (again ignoring coordinate patch issues)
If we replace the Riemannian manifold by a Lorentzian manifold, such as Minkowski space , then the notion of a harmonic map is replaced by that of a wave map, which generalises the scalar wave equation (which corresponds to the case ).
Example 2 (-particle interactions) Take and ; then a function can be interpreted as a collection of trajectories in space, which we give a physical interpretation as the trajectories of particles. If we assign each particle a positive mass , and also introduce a potential energy function , then it turns out that Newton’s laws of motion in this context (with the force on the particle being given by the conservative force ) are equivalent to the trajectories being a critical point of the action functional
Formally, if is a critical point of a functional , this means that
whenever is a (smooth) deformation with (and with respecting whatever boundary conditions are appropriate). Interchanging the derivative and integral, we (formally, at least) arrive at
Write for the infinitesimal deformation of . By the chain rule, can be expressed in terms of . In coordinates, we have
where we parameterise by , and we use subscripts on to denote partial derivatives in the various coefficients. (One can of course work in a coordinate-free manner here if one really wants to, but the notation becomes a little cumbersome due to the need to carefully split up the tangent space of , and we will not do so here.) Thus we can view (2) as an integral identity that asserts the vanishing of a certain integral, whose integrand involves , where vanishes at the boundary but is otherwise unconstrained.
A general rule of thumb in PDE and calculus of variations is that whenever one has an integral identity of the form for some class of functions that vanishes on the boundary, then there must be an associated differential identity that justifies this integral identity through Stokes’ theorem. This rule of thumb helps explain why integration by parts is used so frequently in PDE to justify integral identities. The rule of thumb can fail when one is dealing with “global” or “cohomologically non-trivial” integral identities of a topological nature, such as the Gauss-Bonnet or Kazhdan-Warner identities, but is quite reliable for “local” or “cohomologically trivial” identities, such as those arising from calculus of variations.
In any case, if we apply this rule to (2), we expect that the integrand should be expressible as a spatial divergence. This is indeed the case:
Proposition 1 (Formal) Let be a critical point of the functional defined in (1). Then for any deformation with , we have
where is the vector field that is expressible in coordinates as
Proof: Comparing (4) with (3), we see that the claim is equivalent to the Euler-Lagrange equation
The same computation, together with an integration by parts, shows that (2) may be rewritten as
Since is unconstrained on the interior of , the claim (6) follows (at a formal level, at least).
Many variational problems also enjoy one-parameter continuous symmetries: given any field (not necessarily a critical point), one can place that field in a one-parameter family with , such that
for all ; in particular,
which can be written as (2) as before. Applying the previous rule of thumb, we thus expect another divergence identity
whenever arises from a continuous one-parameter symmetry. This expectation is indeed the case in many examples. For instance, if the spatial domain is the Euclidean space , and the Lagrangian (when expressed in coordinates) has no direct dependence on the spatial variable , thus
then we obtain translation symmetries
for , where is the standard basis for . For a fixed , the left-hand side of (7) then becomes
where . Another common type of symmetry is a pointwise symmetry, in which
for all , in which case (7) clearly holds with .
If we subtract (4) from (7), we obtain the celebrated theorem of Noether linking symmetries with conservation laws:
Theorem 2 (Noether’s theorem) Suppose that is a critical point of the functional (1), and let be a one-parameter continuous symmetry with . Let be the vector field in (5), and let be the vector field in (7). Then we have the pointwise conservation law
In particular, for one-dimensional variational problems, in which , we have the conservation law for all (assuming of course that is connected and contains ).
Noether’s theorem gives a systematic way to locate conservation laws for solutions to variational problems. For instance, if and the Lagrangian has no explicit time dependence, thus
then by using the time translation symmetry , we have
as discussed previously, whereas we have , and hence by (5)
and so Noether’s theorem gives conservation of the Hamiltonian
For instance, for geodesic flow, the Hamiltonian works out to be
so we see that the speed of the geodesic is conserved over time.
For pointwise symmetries (9), vanishes, and so Noether’s theorem simplifies to ; in the one-dimensional case , we thus see from (5) that the quantity
is conserved in time. For instance, for the -particle system in Example 2, if we have the translation invariance
for all , then we have the pointwise translation symmetry
for all , and some , in which case , and the conserved quantity (11) becomes
as was arbitrary, this establishes conservation of the total momentum
Similarly, if we have the rotation invariance
for any and , then we have the pointwise rotation symmetry
for any skew-symmetric real matrix , in which case , and the conserved quantity (11) becomes
since is an arbitrary skew-symmetric matrix, this establishes conservation of the total angular momentum
Below the fold, I will describe how Noether’s theorem can be used to locate all of the conserved quantities for the Euler equations of inviscid fluid flow, discussed in this previous post, by interpreting that flow as geodesic flow in an infinite dimensional manifold.
The Euler equations for incompressible inviscid fluids may be written as
where is the velocity field, and is the pressure field. To avoid technicalities we will assume that both fields are smooth, and that is bounded. We will take the dimension to be at least two, with the three-dimensional case being of course especially interesting.
The Euler equations are the inviscid limit of the Navier-Stokes equations; as discussed in my previous post, one potential route to establishing finite time blowup for the latter equations when is to be able to construct “computers” solving the Euler equations, which generate smaller replicas of themselves in a noise-tolerant manner (as the viscosity term in the Navier-Stokes equation is to be viewed as perturbative noise).
Perhaps the most prominent obstacles to this route are the conservation laws for the Euler equations, which limit the types of final states that a putative computer could reach from a given initial state. Most famously, we have the conservation of energy
(assuming sufficient decay of the velocity field at infinity); thus for instance it would not be possible for a computer to generate a replica of itself which had greater total energy than the initial computer. This by itself is not a fatal obstruction (in this paper of mine, I constructed such a “computer” for an averaged Euler equation that still obeyed energy conservation). However, there are other conservation laws also, for instance in three dimensions one also has conservation of helicity
and (formally, at least) one has conservation of momentum
and angular momentum
(although, as we shall discuss below, due to the slow decay of at infinity, these integrals have to either be interpreted in a principal value sense, or else replaced with their vorticity-based formulations, namely impulse and moment of impulse). Total vorticity
is also conserved, although it turns out in three dimensions that this quantity vanishes when one assumes sufficient decay at infinity. Then there are the pointwise conservation laws: the vorticity and the volume form are both transported by the fluid flow, while the velocity field (when viewed as a covector) is transported up to a gradient; among other things, this gives the transport of vortex lines as well as Kelvin’s circulation theorem, and can also be used to deduce the helicity conservation law mentioned above. In my opinion, none of these laws actually prohibits a self-replicating computer from existing within the laws of ideal fluid flow, but they do significantly complicate the task of actually designing such a computer, or of the basic “gates” that such a computer would consist of.
Below the fold I would like to record and derive all the conservation laws mentioned above, which to my knowledge essentially form the complete set of known conserved quantities for the Euler equations. The material here (although not the notation) is drawn from this text of Majda and Bertozzi.
Mertens’ theorems are a set of classical estimates concerning the asymptotic distribution of the prime numbers:
Theorem 1 (Mertens’ theorems) In the asymptotic limit , we have
where is the Euler-Mascheroni constant, defined by requiring that
The third theorem (3) is usually stated in exponentiated form
but in the logarithmic form (3) we see that it is strictly stronger than (2), in view of the asymptotic .
Remarkably, these theorems can be proven without the assistance of the prime number theorem
which was proven about two decades after Mertens’ work. (But one can certainly use versions of the prime number theorem with good error term, together with summation by parts, to obtain good estimates on the various errors in Mertens’ theorems.) Roughly speaking, the reason for this is that Mertens’ theorems only require control on the Riemann zeta function in the neighbourhood of the pole at , whereas (as discussed in this previous post) the prime number theorem requires control on the zeta function on (a neighbourhood of) the line . Specifically, Mertens’ theorem is ultimately deduced from the Euler product formula
valid in the region (which is ultimately a Fourier-Dirichlet transform of the fundamental theorem of arithmetic), and following crude asymptotics:
Proposition 2 (Simple pole) For sufficiently close to with , we have
Proof: For as in the proposition, we have for any natural number and , and hence
Summing in and using the identity , we obtain the first claim. Similarly, we have
and by summing in and using the identity (the derivative of the previous identity) we obtain the claim.
The first two of Mertens’ theorems (1), (2) are relatively easy to prove, and imply the third theorem (3) except with replaced by an unspecified absolute constant. To get the specific constant requires a little bit of additional effort. From (4), one might expect that the appearance of arises from the refinement
that one can obtain to (6). However, it turns out that the connection is not so much with the zeta function, but with the Gamma function, and specifically with the identity (which is of course related to (7) through the functional equation for zeta, but can be proven without any reference to zeta functions). More specifically, we have the following asymptotic for the exponential integral:
Proposition 3 (Exponential integral asymptotics) For sufficiently small , one has
A routine integration by parts shows that this asymptotic is equivalent to the identity
which is the identity mentioned previously.
Proof: We start by using the identity to express the harmonic series as
or on summing the geometric series
Since , we thus have
making the change of variables , this becomes
As , converges pointwise to and is pointwise dominated by . Taking limits as using dominated convergence, we conclude that
or equivalently
The claim then follows by bounding the portion of the integral on the left-hand side.
Below the fold I would like to record how Proposition 2 and Proposition 3 imply Theorem 1; the computations are utterly standard, and can be found in most analytic number theory texts, but I wanted to write them down for my own benefit (I always keep forgetting, in particular, how the third of Mertens’ theorems is proven).
(This is an extended blog post version of my talk “Ultraproducts as a Bridge Between Discrete and Continuous Analysis” that I gave at the Simons institute for the theory of computing at the workshop “Neo-Classical methods in discrete analysis“. Some of the material here is drawn from previous blog posts, notably “Ultraproducts as a bridge between hard analysis and soft analysis” and “Ultralimit analysis and quantitative algebraic geometry“‘. The text here has substantially more details than the talk; one may wish to skip all of the proofs given here to obtain a closer approximation to the original talk.)
Discrete analysis, of course, is primarily interested in the study of discrete (or “finitary”) mathematical objects: integers, rational numbers (which can be viewed as ratios of integers), finite sets, finite graphs, finite or discrete metric spaces, and so forth. However, many powerful tools in mathematics (e.g. ergodic theory, measure theory, topological group theory, algebraic geometry, spectral theory, etc.) work best when applied to continuous (or “infinitary”) mathematical objects: real or complex numbers, manifolds, algebraic varieties, continuous topological or metric spaces, etc. In order to apply results and ideas from continuous mathematics to discrete settings, there are basically two approaches. One is to directly discretise the arguments used in continuous mathematics, which often requires one to keep careful track of all the bounds on various quantities of interest, particularly with regard to various error terms arising from discretisation which would otherwise have been negligible in the continuous setting. The other is to construct continuous objects as limits of sequences of discrete objects of interest, so that results from continuous mathematics may be applied (often as a “black box”) to the continuous limit, which then can be used to deduce consequences for the original discrete objects which are quantitative (though often ineffectively so). The latter approach is the focus of this current talk.
The following table gives some examples of a discrete theory and its continuous counterpart, together with a limiting procedure that might be used to pass from the former to the latter:
(Discrete) | (Continuous) | (Limit method) |
Ramsey theory | Topological dynamics | Compactness |
Density Ramsey theory | Ergodic theory | Furstenberg correspondence principle |
Graph/hypergraph regularity | Measure theory | Graph limits |
Polynomial regularity | Linear algebra | Ultralimits |
Structural decompositions | Hilbert space geometry | Ultralimits |
Fourier analysis | Spectral theory | Direct and inverse limits |
Quantitative algebraic geometry | Algebraic geometry | Schemes |
Discrete metric spaces | Continuous metric spaces | Gromov-Hausdorff limits |
Approximate group theory | Topological group theory | Model theory |
As the above table illustrates, there are a variety of different ways to form a limiting continuous object. Roughly speaking, one can divide limits into three categories:
- Topological and metric limits. These notions of limits are commonly used by analysts. Here, one starts with a sequence (or perhaps a net) of objects in a common space , which one then endows with the structure of a topological space or a metric space, by defining a notion of distance between two points of the space, or a notion of open neighbourhoods or open sets in the space. Provided that the sequence or net is convergent, this produces a limit object , which remains in the same space, and is “close” to many of the original objects with respect to the given metric or topology.
- Categorical limits. These notions of limits are commonly used by algebraists. Here, one starts with a sequence (or more generally, a diagram) of objects in a category , which are connected to each other by various morphisms. If the ambient category is well-behaved, one can then form the direct limit or the inverse limit of these objects, which is another object in the same category , and is connected to the original objects by various morphisms.
- Logical limits. These notions of limits are commonly used by model theorists. Here, one starts with a sequence of objects or of spaces , each of which is (a component of) a model for given (first-order) mathematical language (e.g. if one is working in the language of groups, might be groups and might be elements of these groups). By using devices such as the ultraproduct construction, or the compactness theorem in logic, one can then create a new object or a new space , which is still a model of the same language (e.g. if the spaces were all groups, then the limiting space will also be a group), and is “close” to the original objects or spaces in the sense that any assertion (in the given language) that is true for the limiting object or space, will also be true for many of the original objects or spaces, and conversely. (For instance, if is an abelian group, then the will also be abelian groups for many .)
The purpose of this talk is to highlight the third type of limit, and specifically the ultraproduct construction, as being a “universal” limiting procedure that can be used to replace most of the limits previously mentioned. Unlike the topological or metric limits, one does not need the original objects to all lie in a common space in order to form an ultralimit ; they are permitted to lie in different spaces ; this is more natural in many discrete contexts, e.g. when considering graphs on vertices in the limit when goes to infinity. Also, no convergence properties on the are required in order for the ultralimit to exist. Similarly, ultraproduct limits differ from categorical limits in that no morphisms between the various spaces involved are required in order to construct the ultraproduct.
With so few requirements on the objects or spaces , the ultraproduct construction is necessarily a very “soft” one. Nevertheless, the construction has two very useful properties which make it particularly useful for the purpose of extracting good continuous limit objects out of a sequence of discrete objects. First of all, there is Łos’s theorem, which roughly speaking asserts that any first-order sentence which is asymptotically obeyed by the , will be exactly obeyed by the limit object ; in particular, one can often take a discrete sequence of “partial counterexamples” to some assertion, and produce a continuous “complete counterexample” that same assertion via an ultraproduct construction; taking the contrapositives, one can often then establish a rigorous equivalence between a quantitative discrete statement and its qualitative continuous counterpart. Secondly, there is the countable saturation property that ultraproducts automatically enjoy, which is a property closely analogous to that of compactness in topological spaces, and can often be used to ensure that the continuous objects produced by ultraproduct methods are “complete” or “compact” in various senses, which is particularly useful in being able to upgrade qualitative (or “pointwise”) bounds to quantitative (or “uniform”) bounds, more or less “for free”, thus reducing significantly the burden of “epsilon management” (although the price one pays for this is that one needs to pay attention to which mathematical objects of study are “standard” and which are “nonstandard”). To achieve this compactness or completeness, one sometimes has to restrict to the “bounded” portion of the ultraproduct, and it is often also convenient to quotient out the “infinitesimal” portion in order to complement these compactness properties with a matching “Hausdorff” property, thus creating familiar examples of continuous spaces, such as locally compact Hausdorff spaces.
Ultraproducts are not the only logical limit in the model theorist’s toolbox, but they are one of the simplest to set up and use, and already suffice for many of the applications of logical limits outside of model theory. In this post, I will set out the basic theory of these ultraproducts, and illustrate how they can be used to pass between discrete and continuous theories in each of the examples listed in the above table.
Apart from the initial “one-time cost” of setting up the ultraproduct machinery, the main loss one incurs when using ultraproduct methods is that it becomes very difficult to extract explicit quantitative bounds from results that are proven by transferring qualitative continuous results to the discrete setting via ultraproducts. However, in many cases (particularly those involving regularity-type lemmas) the bounds are already of tower-exponential type or worse, and there is arguably not much to be lost by abandoning the explicit quantitative bounds altogether.
The classical foundations of probability theory (discussed for instance in this previous blog post) is founded on the notion of a probability space – a space (the sample space) equipped with a -algebra (the event space), together with a countably additive probability measure that assigns a real number in the interval to each event.
One can generalise the concept of a probability space to a finitely additive probability space, in which the event space is now only a Boolean algebra rather than a -algebra, and the measure is now only finitely additive instead of countably additive, thus when are disjoint events. By giving up countable additivity, one loses a fair amount of measure and integration theory, and in particular the notion of the expectation of a random variable becomes problematic (unless the random variable takes only finitely many values). Nevertheless, one can still perform a fair amount of probability theory in this weaker setting.
In this post I would like to describe a further weakening of probability theory, which I will call qualitative probability theory, in which one does not assign a precise numerical probability value to each event, but instead merely records whether this probability is zero, one, or something in between. Thus is now a function from to the set , where is a new symbol that replaces all the elements of the open interval . In this setting, one can no longer compute quantitative expressions, such as the mean or variance of a random variable; but one can still talk about whether an event holds almost surely, with positive probability, or with zero probability, and there are still usable notions of independence. (I will refer to classical probability theory as quantitative probability theory, to distinguish it from its qualitative counterpart.)
The main reason I want to introduce this weak notion of probability theory is that it becomes suited to talk about random variables living inside algebraic varieties, even if these varieties are defined over fields other than or . In algebraic geometry one often talks about a “generic” element of a variety defined over a field , which does not lie in any specified variety of lower dimension defined over . Once has positive dimension, such generic elements do not exist as classical, deterministic -points in , since of course any such point lies in the -dimensional subvariety of . There are of course several established ways to deal with this problem. One way (which one might call the “Weil” approach to generic points) is to extend the field to a sufficiently transcendental extension , in order to locate a sufficient number of generic points in . Another approach (which one might dub the “Zariski” approach to generic points) is to work scheme-theoretically, and interpret a generic point in as being associated to the zero ideal in the function ring of . However I want to discuss a third perspective, in which one interprets a generic point not as a deterministic object, but rather as a random variable taking values in , but which lies in any given lower-dimensional subvariety of with probability zero. This interpretation is intuitive, but difficult to implement in classical probability theory (except perhaps when considering varieties over or ) due to the lack of a natural probability measure to place on algebraic varieties; however it works just fine in qualitative probability theory. In particular, the algebraic geometry notion of being “generically true” can now be interpreted probabilistically as an assertion that something is “almost surely true”.
It turns out that just as qualitative random variables may be used to interpret the concept of a generic point, they can also be used to interpret the concept of a type in model theory; the type of a random variable is the set of all predicates that are almost surely obeyed by . In contrast, model theorists often adopt a Weil-type approach to types, in which one works with deterministic representatives of a type, which often do not occur in the original structure of interest, but only in a sufficiently saturated extension of that structure (this is the analogue of working in a sufficiently transcendental extension of the base field). However, it seems that (in some cases at least) one can equivalently view types in terms of (qualitative) random variables on the original structure, avoiding the need to extend that structure. (Instead, one reserves the right to extend the sample space of one’s probability theory whenever necessary, as part of the “probabilistic way of thinking” discussed in this previous blog post.) We illustrate this below the fold with two related theorems that I will interpret through the probabilistic lens: the “group chunk theorem” of Weil (and later developed by Hrushovski), and the “group configuration theorem” of Zilber (and again later developed by Hrushovski). For sake of concreteness we will only consider these theorems in the theory of algebraically closed fields, although the results are quite general and can be applied to many other theories studied in model theory.
Recent Comments