You are currently browsing the monthly archive for November 2012.
Lars Hörmander, who made fundamental contributions to all areas of partial differential equations, but particularly in developing the analysis of variable-coefficient linear PDE, died last Sunday, aged 81.
I unfortunately never met Hörmander personally, but of course I encountered his work all the time while working in PDE. One of his major contributions to the subject was to systematically develop the calculus of Fourier integral operators (FIOs), which are a substantial generalisation of pseudodifferential operators and which can be used to (approximately) solve linear partial differential equations, or to transform such equations into a more convenient form. Roughly speaking, Fourier integral operators are to linear PDE as canonical transformations are to Hamiltonian mechanics (and one can in fact view FIOs as a quantisation of a canonical transformation). They are a large class of transformations, for instance the Fourier transform, pseudodifferential operators, and smooth changes of the spatial variable are all examples of FIOs, and (as long as certain singular situations are avoided) the composition of two FIOs is again an FIO.
The full theory of FIOs is quite extensive, occupying the entire final volume of Hormander’s famous four-volume series “The Analysis of Linear Partial Differential Operators”. I am certainly not going to try to attempt to summarise it here, but I thought I would try to motivate how these operators arise when trying to transform functions. For simplicity we will work with functions on a Euclidean domain (although FIOs can certainly be defined on more general smooth manifolds, and there is an extension of the theory that also works on manifolds with boundary). As this will be a heuristic discussion, we will ignore all the (technical, but important) issues of smoothness or convergence with regards to the functions, integrals and limits that appear below, and be rather vague with terms such as “decaying” or “concentrated”.
A function can be viewed from many different perspectives (reflecting the variety of bases, or approximate bases, that the Hilbert space offers). Most directly, we have the physical space perspective, viewing as a function of the physical variable . In many cases, this function will be concentrated in some subregion of physical space. For instance, a gaussian wave packet
where , and are parameters, would be physically concentrated in the ball . Then we have the frequency space (or momentum space) perspective, viewing now as a function of the frequency variable . For this discussion, it will be convenient to normalise the Fourier transform using a small constant (which has the physical interpretation of Planck’s constant if one is doing quantum mechanics), thus
For instance, for the gaussian wave packet (1), one has
and so we see that is concentrated in frequency space in the ball .
However, there is a third (but less rigorous) way to view a function in , which is the phase space perspective in which one tries to view as distributed simultaneously in physical space and in frequency space, thus being something like a measure on the phase space . Thus, for instance, the function (1) should heuristically be concentrated on the region in phase space. Unfortunately, due to the uncertainty principle, there is no completely satisfactory way to canonically and rigorously define what the “phase space portrait” of a function should be. (For instance, the Wigner transform of can be viewed as an attempt to describe the distribution of the energy of in phase space, except that this transform can take negative or even complex values; see Folland’s book for further discussion.) Still, it is a very useful heuristic to think of functions has having a phase space portrait, which is something like a non-negative measure on phase space that captures the distribution of functions in both space and frequency, albeit with some “quantum fuzziness” that shows up whenever one tries to inspect this measure at scales of physical space and frequency space that together violate the uncertainty principle. (The score of a piece of music is a good everyday example of a phase space portrait of a function, in this case a sound wave; here, the physical space is the time axis (the horizontal dimension of the score) and the frequency space is the vertical dimension. Here, the time and frequency scales involved are well above the uncertainty principle limit (a typical note lasts many hundreds of cycles, whereas the uncertainty principle kicks in at cycles) and so there is no obstruction here to musical notation being unambiguous.) Furthermore, if one takes certain asymptotic limits, one can recover a precise notion of a phase space portrait; for instance if one takes the semiclassical limit then, under certain circumstances, the phase space portrait converges to a well-defined classical probability measure on phase space; closely related to this is the high frequency limit of a fixed function, which among other things defines the wave front set of that function, which can be viewed as another asymptotic realisation of the phase space portrait concept.
If functions in can be viewed as a sort of distribution in phase space, then linear operators should be viewed as various transformations on such distributions on phase space. For instance, a pseudodifferential operator should correspond (as a zeroth approximation) to multiplying a phase space distribution by the symbol of that operator, as discussed in this previous blog post. Note that such operators only change the amplitude of the phase space distribution, but not the support of that distribution.
Now we turn to operators that alter the support of a phase space distribution, rather than the amplitude; we will focus on unitary operators to emphasise the amplitude preservation aspect. These will eventually be key examples of Fourier integral operators. A physical translation should correspond to pushing forward the distribution by the transformation , as can be seen by comparing the physical and frequency space supports of with that of . Similarly, a frequency modulation should correspond to the transformation ; a linear change of variables , where is an invertible linear transformation, should correspond to ; and finally, the Fourier transform should correspond to the transformation .
Based on these examples, one may hope that given any diffeomorphism of phase space, one could associate some sort of unitary (or approximately unitary) operator , which (heuristically, at least) pushes the phase space portrait of a function forward by . However, there is an obstruction to doing so, which can be explained as follows. If pushes phase space portraits by , and pseudodifferential operators multiply phase space portraits by , then this suggests the intertwining relationship
The formalisation of this fact in the theory of Fourier integral operators is known as Egorov’s theorem, due to Yu Egorov (and not to be confused with the more widely known theorem of Dmitri Egorov in measure theory).
Applying commutators, we conclude the approximate conjugacy relationship
Now, the pseudodifferential calculus (as discussed in this previous post) tells us (heuristically, at least) that
where is the Poisson bracket. Comparing this with (2), we are then led to the compatibility condition
thus needs to preserve (approximately, at least) the Poisson bracket, or equivalently needs to be a symplectomorphism (again, approximately at least).
Now suppose that is a symplectomorphism. This is morally equivalent to the graph being a Lagrangian submanifold of (where we give the second copy of phase space the negative of the usual symplectic form , thus yielding as the full symplectic form on ; this is another instantiation of the closed graph theorem, as mentioned in this previous post. This graph is known as the canonical relation for the (putative) FIO that is associated to . To understand what it means for this graph to be Lagrangian, we coordinatise as suppose temporarily that this graph was (locally, at least) a smooth graph in the and variables, thus
for some smooth functions . A brief computation shows that the Lagrangian property of is then equivalent to the compatibility conditions
for , where denote the components of . Some Fourier analysis (or Hodge theory) lets us solve these equations as
so that maps to .
for some smooth amplitude function (note that the Fourier transform is the special case when and , which helps explain the genesis of the term “Fourier integral operator”). Indeed, if one computes an inner product for gaussian wave packets of the form (1) and localised in phase space near respectively, then a Taylor expansion of around , followed by a stationary phase computation, shows (again heuristically, and assuming is suitably non-degenerate) that has (3) as its canonical relation. (Furthermore, a refinement of this stationary phase calculation suggests that if is normalised to be the half-density , then should be approximately unitary.) As such, we view (4) as an example of a Fourier integral operator (assuming various smoothness and non-degeneracy hypotheses on the phase and amplitude which we do not detail here).
Of course, it may be the case that is not a graph in the coordinates (for instance, the key examples of translation, modulation, and dilation are not of this form), but then it is often a graph in some other pair of coordinates, such as . In that case one can compose the oscillatory integral construction given above with a Fourier transform, giving another class of FIOs of the form
This class of FIOs covers many important cases; for instance, the translation, modulation, and dilation operators considered earlier can be written in this form after some Fourier analysis. Another typical example is the half-wave propagator for some time , which can be written in the form
This corresponds to the phase space transformation , which can be viewed as the classical propagator associated to the “quantum” propagator . More generally, propagators for linear Hamiltonian partial differential equations can often be expressed (at least approximately) by Fourier integral operators corresponding to the propagator of the associated classical Hamiltonian flow associated to the symbol of the Hamiltonian operator ; this leads to an important mathematical formalisation of the correspondence principle between quantum mechanics and classical mechanics, that is one of the foundations of microlocal analysis and which was extensively developed in Hörmander’s work. (More recently, numerically stable versions of this theory have been developed to allow for rapid and accurate numerical solutions to various linear PDE, for instance through Emmanuel Candés’ theory of curvelets, so the theory that Hörmander built now has some quite significant practical applications in areas such as geology.)
In some cases, the canonical relation may have some singularities (such as fold singularities) which prevent it from being written as graphs in the previous senses, but the theory for defining FIOs even in these cases, and in developing their calculus, is now well established, in large part due to the foundational work of Hörmander.
I’ve just uploaded to the arXiv my joint paper with Vitaly Bergelson, “Multiple recurrence in quasirandom groups“, which is submitted to Geom. Func. Anal.. This paper builds upon a paper of Gowers in which he introduced the concept of a quasirandom group, and established some mixing (or recurrence) properties of such groups. A -quasirandom group is a finite group with no non-trivial unitary representations of dimension at most . We will informally refer to a “quasirandom group” as a -quasirandom group with the quasirandomness parameter large (more formally, one can work with a sequence of -quasirandom groups with going to infinity). A typical example of a quasirandom group is where is a large prime. Quasirandom groups are discussed in depth in this blog post. One of the key properties of quasirandom groups established in Gowers’ paper is the following “weak mixing” property: if are subsets of , then for “almost all” , one has
where denotes the density of in . Here, we use to informally represent an estimate of the form (where is a quantity that goes to zero when the quasirandomness parameter goes to infinity), and “almost all ” denotes “for all in a subset of of density “. As a corollary, if have positive density in (by which we mean that is bounded away from zero, uniformly in the quasirandomness parameter , and similarly for ), then (if the quasirandomness parameter is sufficiently large) we can find elements such that , , . In fact we can find approximately such pairs . To put it another way: if we choose uniformly and independently at random from , then the events , , are approximately independent (thus the random variable resembles a uniformly distributed random variable on in some weak sense). One can also express this mixing property in integral form as
for any bounded functions . (Of course, with being finite, one could replace the integrals here by finite averages if desired.) Or in probabilistic language, we have
where are drawn uniformly and independently at random from .
As observed in Gowers’ paper, one can iterate this observation to find “parallelopipeds” of any given dimension in dense subsets of . For instance, applying (1) with replaced by , , and one can assert (after some relabeling) that for chosen uniformly and independently at random from , the events , , , , , , are approximately independent whenever are dense subsets of ; thus the tuple resebles a uniformly distributed random variable in in some weak sense.
However, there are other tuples for which the above iteration argument does not seem to apply. One of the simplest tuples in this vein is the tuple in , when are drawn uniformly at random from a quasirandom group . Here, one does not expect the tuple to behave as if it were uniformly distributed in , because there is an obvious constraint connecting the last two components of this tuple: they must lie in the same conjugacy class! In particular, if is a subset of that is the union of conjugacy classes, then the events , are perfectly correlated, so that is equal to rather than . Our main result, though, is that in a quasirandom group, this is (approximately) the only constraint on the tuple. More precisely, we have
where goes to zero as , are drawn uniformly and independently at random from , and is drawn uniformly at random from the conjugates of for each fixed choice of .
This is the probabilistic formulation of the above theorem; one can also phrase the theorem in other formulations (such as an integral formulation), and this is detailed in the paper. This theorem leads to a number of recurrence results; for instance, as a corollary of this result, we have
for almost all , and any dense subsets of ; the lower and upper bounds are sharp, with the lower bound being attained when is randomly distributed, and the upper bound when is conjugation-invariant.
To me, the more interesting thing here is not the result itself, but how it is proven. Vitaly and I were not able to find a purely finitary way to establish this mixing theorem. Instead, we had to first use the machinery of ultraproducts (as discussed in this previous post) to convert the finitary statement about a quasirandom group to an infinitary statement about a type of infinite group which we call an ultra quasirandom group (basically, an ultraproduct of increasingly quasirandom finite groups). This is analogous to how the Furstenberg correspondence principle is used to convert a finitary combinatorial problem into an infinitary ergodic theory problem.
Ultra quasirandom groups come equipped with a finite, countably additive measure known as Loeb measure , which is very analogous to the Haar measure of a compact group, except that in the case of ultra quasirandom groups one does not quite have a topological structure that would give compactness. Instead, one has a slightly weaker structure known as a -topology, which is like a topology except that open sets are only closed under countable unions rather than arbitrary ones. There are some interesting measure-theoretic and topological issues regarding the distinction between topologies and -topologies (and between Haar measure and Loeb measure), but for this post it is perhaps best to gloss over these issues and pretend that ultra quasirandom groups come with a Haar measure. One can then recast Theorem 1 as a mixing theorem for the left and right actions of the ultra approximate group on itself, which roughly speaking is the assertion that
for “almost all” , if are bounded measurable functions on , with having zero mean on all conjugacy classes of , where are the left and right translation operators
To establish this mixing theorem, we use the machinery of idempotent ultrafilters, which is a particularly useful tool for understanding the ergodic theory of actions of countable groups that need not be amenable; in the non-amenable setting the classical ergodic averages do not make much sense, but ultrafilter-based averages are still available. To oversimplify substantially, the idempotent ultrafilter arguments let one establish mixing estimates of the form (2) for “many” elements of an infinite-dimensional parallelopiped known as an IP system (provided that the actions of this IP system obey some technical mixing hypotheses, but let’s ignore that for sake of this discussion). The claim then follows by using the quasirandomness hypothesis to show that if the estimate (2) failed for a large set of , then this large set would then contain an IP system, contradicting the previous claim.
Idempotent ultrafilters are an extremely infinitary type of mathematical object (one has to use Zorn’s lemma no fewer than three times just to construct one of these objects!). So it is quite remarkable that they can be used to establish a finitary theorem such as Theorem 1, though as is often the case with such infinitary arguments, one gets absolutely no quantitative control whatsoever on the error terms appearing in that theorem. (It is also mildly amusing to note that our arguments involve the use of ultrafilters in two completely different ways: firstly in order to set up the ultraproduct that converts the finitary mixing problem to an infinitary one, and secondly to solve the infinitary mixing problem. Despite some superficial similarities, there appear to be no substantial commonalities between these two usages of ultrafilters.) There is already a fair amount of literature on using idempotent ultrafilter methods in infinitary ergodic theory, and perhaps by further development of ultraproduct correspondence principles, one can use such methods to obtain further finitary consequences (although the state of the art for idempotent ultrafilter ergodic theory has not advanced much beyond the analysis of two commuting shifts currently, which is the main reason why our arguments only handle the pattern and not more sophisticated patterns).
We also have some miscellaneous other results in the paper. It turns out that by using the triangle removal lemma from graph theory, one can obtain a recurrence result that asserts that whenever is a dense subset of a finite group (not necessarily quasirandom), then there are pairs such that all lie in . Using a hypergraph generalisation of the triangle removal lemma known as the hypergraph removal lemma, one can obtain more complicated versions of this statement; for instance, if is a dense subset of , then one can find triples such that all lie in . But the method is tailored to the specific types of patterns given here, and we do not have a general method for obtaining recurrence or mixing properties for arbitrary patterns of words in some finite alphabet such as .
We also give some properties of a model example of an ultra quasirandom group, namely the ultraproduct of where is a sequence of primes going off to infinity. Thanks to the substantial recent progress (by Helfgott, Bourgain, Gamburd, Breuillard, and others) on understanding the expansion properties of the finite groups , we have a fair amount of knowledge on the ultraproduct as well; for instance any two elements of will almost surely generate a spectral gap. We don’t have any direct application of this particular ultra quasirandom group, but it might be interesting to study it further.
Given a function between two sets , we can form the graph
which is a subset of the Cartesian product .
There are a number of “closed graph theorems” in mathematics which relate the regularity properties of the function with the closure properties of the graph , assuming some “completeness” properties of the domain and range . The most famous of these is the closed graph theorem from functional analysis, which I phrase as follows:
Theorem 1 (Closed graph theorem (functional analysis)) Let be complete normed vector spaces over the reals (i.e. Banach spaces). Then a function is a continuous linear transformation if and only if the graph is both linearly closed (i.e. it is a linear subspace of ) and topologically closed (i.e. closed in the product topology of ).
I like to think of this theorem as linking together qualitative and quantitative notions of regularity preservation properties of an operator ; see this blog post for further discussion.
The theorem is equivalent to the assertion that any continuous linear bijection from one Banach space to another is necessarily an isomorphism in the sense that the inverse map is also continuous and linear. Indeed, to see that this claim implies the closed graph theorem, one applies it to the projection from to , which is a continuous linear bijection; conversely, to deduce this claim from the closed graph theorem, observe that the graph of the inverse is the reflection of the graph of . As such, the closed graph theorem is a corollary of the open mapping theorem, which asserts that any continuous linear surjection from one Banach space to another is open. (Conversely, one can deduce the open mapping theorem from the closed graph theorem by quotienting out the kernel of the continuous surjection to get a bijection.)
It turns out that there is a closed graph theorem (or equivalent reformulations of that theorem, such as an assertion that bijective morphisms between sufficiently “complete” objects are necessarily isomorphisms, or as an open mapping theorem) in many other categories in mathematics as well. Here are some easy ones:
Theorem 2 (Closed graph theorem (linear algebra)) Let be vector spaces over a field . Then a function is a linear transformation if and only if the graph is linearly closed.
Theorem 3 (Closed graph theorem (group theory)) Let be groups. Then a function is a group homomorphism if and only if the graph is closed under the group operations (i.e. it is a subgroup of ).
Theorem 4 (Closed graph theorem (order theory)) Let be totally ordered sets. Then a function is monotone increasing if and only if the graph is totally ordered (using the product order on ).
Remark 1 Similar results to the above three theorems (with similarly easy proofs) hold for other algebraic structures, such as rings (using the usual product of rings), modules, algebras, or Lie algebras, groupoids, or even categories (a map between categories is a functor iff its graph is again a category). (ADDED IN VIEW OF COMMENTS: further examples include affine spaces and -sets (sets with an action of a given group ).) There are also various approximate versions of this theorem that are useful in arithmetic combinatorics, that relate the property of a map being an “approximate homomorphism” in some sense with its graph being an “approximate group” in some sense. This is particularly useful for this subfield of mathematics because there are currently more theorems about approximate groups than about approximate homomorphisms, so that one can profitably use closed graph theorems to transfer results about the former to results about the latter.
A slightly more sophisticated result in the same vein:
Theorem 5 (Closed graph theorem (point set topology)) Let be compact Hausdorff spaces. Then a function is continuous if and only if the graph is topologically closed.
Indeed, the “only if” direction is easy, while for the “if” direction, note that if is a closed subset of , then it is compact Hausdorff, and the projection map from to is then a bijective continuous map between compact Hausdorff spaces, which is then closed, thus open, and hence a homeomorphism, giving the claim.
Note that the compactness hypothesis is necessary: for instance, the function defined by for and for is a function which has a closed graph, but is discontinuous.
A similar result (but relying on a much deeper theorem) is available in algebraic geometry, as I learned after asking this MathOverflow question:
Theorem 6 (Closed graph theorem (algebraic geometry)) Let be normal projective varieties over an algebraically closed field of characteristic zero. Then a function is a regular map if and only if the graph is Zariski-closed.
Proof: (Sketch) For the only if direction, note that the map is a regular map from the projective variety to the projective variety and is thus a projective morphism, hence is proper. In particular, the image of under this map is Zariski-closed.
Conversely, if is Zariski-closed, then it is also a projective variety, and the projection is a projective morphism from to , which is clearly quasi-finite; by the characteristic zero hypothesis, it is also separated. Applying (Grothendieck’s form of) Zariski’s main theorem, this projection is the composition of an open immersion and a finite map. As projective varieties are complete, the open immersion is an isomorphism, and so the projection from to is finite. Being injective and separable, the degree of this finite map must be one, and hence and are isomorphic, hence (by normality of ) is contained in (the image of) , which makes the map from to regular, which makes regular.
The counterexample of the map given by for and demonstrates why the projective hypothesis is necessary. The necessity of the normality condition (or more precisely, a weak normality condition) is demonstrated by (the projective version of) the map from the cusipdal curve to . (If one restricts attention to smooth varieties, though, normality becomes automatic.) The necessity of characteristic zero is demonstrated by (the projective version of) the inverse of the Frobenius map on a field of characteristic .
There are also a number of closed graph theorems for topological groups, of which the following is typical (see Exercise 3 of these previous blog notes):
Theorem 7 (Closed graph theorem (topological group theory)) Let be -compact, locally compact Hausdorff groups. Then a function is a continuous homomorphism if and only if the graph is both group-theoretically closed and topologically closed.
The hypotheses of being -compact, locally compact, and Hausdorff can be relaxed somewhat, but I doubt that they can be eliminated entirely (though I do not have a ready counterexample for this).
In several complex variables, it is a classical theorem (see e.g. Lemma 4 of this blog post) that a holomorphic function from a domain in to is locally injective if and only if it is a local diffeomorphism (i.e. its derivative is everywhere non-singular). This leads to a closed graph theorem for complex manifolds:
Theorem 8 (Closed graph theorem (complex manifolds)) Let be complex manifolds. Then a function is holomorphic if and only if the graph is a complex manifold (using the complex structure inherited from ) of the same dimension as .
Indeed, one applies the previous observation to the projection from to . The dimension requirement is needed, as can be seen from the example of the map defined by for and .
(ADDED LATER:) There is a real analogue to the above theorem:
Theorem 9 (Closed graph theorem (real manifolds)) Let be real manifolds. Then a function is continuous if and only if the graph is a real manifold of the same dimension as .
Note though that the analogous claim for smooth real manifolds fails: the function defined by has a smooth graph, but is not itself smooth.
(ADDED YET LATER:) Here is an easy closed graph theorem in the symplectic category:
Theorem 10 (Closed graph theorem (symplectic geometry)) Let and be smooth symplectic manifolds of the same dimension. Then a smooth map is a symplectic morphism (i.e. ) if and only if the graph is a Lagrangian submanifold of with the symplectic form .
In view of the symplectic rigidity phenomenon, it is likely that the smoothness hypotheses on can be relaxed substantially, but I will not try to formulate such a result here.
There are presumably many further examples of closed graph theorems (or closely related theorems, such as criteria for inverting a morphism, or open mapping type theorems) throughout mathematics; I would be interested to know of further examples.
I recently finished the first draft of the last of my books based on my 2011 blog posts (and also my Google buzzes and Google+ posts from that year), entitled “Spending symmetry“. The PDF of this draft is available here. This is again a rather assorted (and lightly edited) collection of posts (and buzzes, and Google+ posts), though concentrating in the areas of analysis (both standard and nonstandard), logic, and geometry. As always, comments and corrections are welcome.
I’ve just uploaded to the arXiv my paper “Expanding polynomials over finite fields of large characteristic, and a regularity lemma for definable sets“, submitted to Contrib. Disc. Math. The motivation of this paper is to understand a certain polynomial variant of the sum-product phenomenon in finite fields. This phenomenon asserts that if is a non-empty subset of a finite field , then either the sumset or product set will be significantly larger than , unless is close to a subfield of (or to ). In particular, in the regime when is large, say , one expects an expansion bound of the form
for some absolute constants . Results of this type are known; for instance, Hart, Iosevich, and Solymosi obtained precisely this bound for (in the case when is prime), which was then improved by Garaev to .
We have focused here on the case when is a large subset of , but sum-product estimates are also extremely interesting in the opposite regime in which is allowed to be small (see for instance the papers of Katz–Shen and Li and of Garaev for some recent work in this case, building on some older papers of Bourgain, Katz and myself and of Bourgain, Glibichuk, and Konyagin). However, the techniques used in these two regimes are rather different. For large subsets of , it is often profitable to use techniques such as the Fourier transform or the Cauchy-Schwarz inequality to “complete” a sum over a large set (such as ) into a set over the entire field , and then to use identities concerning complete sums (such as the Weil bound on complete exponential sums over a finite field). For small subsets of , such techniques are usually quite inefficient, and one has to proceed by somewhat different combinatorial methods which do not try to exploit the ambient field . But my paper focuses exclusively on the large regime, and unfortunately does not directly say much (except through reasoning by analogy) about the small case.
Note that it is necessary to have both and appear on the left-hand side of (1). Indeed, if one just has the sumset , then one can set to be a long arithmetic progression to give counterexamples to (1). Similarly, if one just has a product set , then one can set to be a long geometric progression. The sum-product phenomenon can then be viewed that it is not possible to simultaneously behave like a long arithmetic progression and a long geometric progression, unless one is already very close to behaving like a subfield.
Now we consider a polynomial variant of the sum-product phenomenon, where we consider a polynomial image
of a set with respect to a polynomial ; we can also consider the asymmetric setting of the image
of two subsets . The regime we will be interested is the one where the field is large, and the subsets of are also large, but the polynomial has bounded degree. Actually, for technical reasons it will not be enough for us to assume that has large cardinality; we will also need to assume that has large characteristic. (The two concepts are synonymous for fields of prime order, but not in general; for instance, the field with elements becomes large as while the characteristic remains fixed at , and is thus not going to be covered by the results in this paper.)
whenever for some absolute constants , unless the polynomial had the degenerate form for some linear function and polynomial , in which behaves too much like to get reasonable expansion. In this paper, we focus instead on the question of bounding alone. In particular, one can ask to classify the polynomials for which one has the weak expansion property
whenever for some absolute constants . One can also ask for stronger versions of this expander property, such as the moderate expansion property
whenever , or the almost strong expansion property
whenever . (One can consider even stronger expansion properties, such as the strong expansion property , but it was shown by Gyarmati and Sarkozy that this property cannot hold for polynomials of two variables of bounded degree when .) One can also consider asymmetric versions of these properties, in which one obtains lower bounds on rather than .
The example of a long arithmetic or geometric progression shows that the polynomials or cannot be expanders in any of the above senses, and a similar construction also shows that polynomials of the form or for some polynomials cannot be expanders. On the other hand, there are a number of results in the literature establishing expansion for various polynomials in two or more variables that are not of this degenerate form (in part because such results are related to incidence geometry questions in finite fields, such as the finite field version of the Erdos distinct distances problem). For instance, Solymosi established weak expansion for polynomials of the form when is a nonlinear polynomial, with generalisations by Hart, Li, and Shen for various polynomials of the form or . Further examples of expanding polynomials appear in the work of Shkredov, Iosevich-Rudnev, and Bukh-Tsimerman, as well as the previously mentioned paper of Vu and of Hart-Li-Shen, and these papers in turn cite many further results which are in the spirit of the polynomial expansion bounds discussed here (for instance, dealing with the small regime, or working in other fields such as instead of in finite fields ). We will not summarise all these results here; they are summarised briefly in my paper, and in more detail in the papers of Hart-Li-Shen and of Bukh-Tsimerman. But we will single out one of the results of Bukh-Tsimerman, which is one of most recent and general of these results, and closest to the results of my own paper. Roughly speaking, in this paper it is shown that a polynomial of two variables and bounded degree will be a moderate expander if it is non-composite (in the sense that it does not take the form for some non-linear polynomial and some polynomial , possibly having coefficients in the algebraic completion of ) and is monic on both and , thus it takes the form for some and some polynomial of degree at most in , and similarly with the roles of and reversed, unless is of the form or (in which case the expansion theory is covered to a large extent by the previous work of Hart, Li, and Shen).
Our first main result improves upon the Bukh-Tsimerman result by strengthening the notion of expansion and removing the non-composite and monic hypotheses, but imposes a condition of large characteristic. I’ll state the result here slightly informally as follows:
Theorem 1 (Criterion for moderate expansion) Let be a polynomial of bounded degree over a finite field of sufficiently large characteristic, and suppose that is not of the form or for some polynomials . Then one has the (asymmetric) moderate expansion property
This is basically a sharp necessary and sufficient condition for asymmetric expansion moderate for polynomials of two variables. In the paper, analogous sufficient conditions for weak or almost strong expansion are also given, although these are not quite as satisfactory (particularly the conditions for almost strong expansion, which include a somewhat complicated algebraic condition which is not easy to check, and which I would like to simplify further, but was unable to).
The argument here resembles the Bukh-Tsimerman argument in many ways. One can view the result as an assertion about the expansion properties of the graph , which can essentially be thought of as a somewhat sparse three-uniform hypergraph on . Being sparse, it is difficult to directly apply techniques from dense graph or hypergraph theory for this situation; however, after a few applications of the Cauchy-Schwarz inequality, it turns out (as observed by Bukh and Tsimerman) that one can essentially convert the problem to one about the expansion properties of the set
(actually, one should view this as a multiset, but let us ignore this technicality) which one expects to be a dense set in , except in the case when the associated algebraic variety
fails to be Zariski dense, but it turns out that in this case one can use some differential geometry and Riemann surface arguments (after first invoking the Lefschetz principle and the high characteristic hypothesis to work over the complex numbers instead over a finite field) to show that is of the form or . This reduction is related to the classical fact that the only one-dimensional algebraic groups over the complex numbers are the additive group , the multiplicative group , or the elliptic curves (but the latter have a group law given by rational functions rather than polynomials, and so ultimately end up being eliminated from consideration, though they would play an important role if one wanted to study the expansion properties of rational functions).
It remains to understand the structure of the set (2) is. To understand dense graphs or hypergraphs, one of the standard tools of choice is the Szemerédi regularity lemma, which carves up such graphs into a bounded number of cells, with the graph behaving pseudorandomly on most pairs of cells. However, the bounds in this lemma are notoriously poor (the regularity obtained is an inverse tower exponential function of the number of cells), and this makes this lemma unsuitable for the type of expansion properties we seek (in which we want to deal with sets which have a polynomial sparsity, e.g. ). Fortunately, in the case of sets such as (2) which are definable over the language of rings, it turns out that a much stronger regularity lemma is available, which I call the “algebraic regularity lemma”. I’ll state it (again, slightly informally) in the context of graphs as follows:
Lemma 2 (Algebraic regularity lemma) Let be a finite field of large characteristic, and let be definable sets over of bounded complexity (i.e. are subsets of , for some bounded that can be described by some first-order predicate in the language of rings of bounded length and involving boundedly many constants). Let be a definable subset of , again of bounded complexity (one can view as a bipartite graph connecting and ). Then one can partition into a bounded number of cells , , still definable with bounded complexity, such that for all pairs , , one has the regularity property
for all , where is the density of in .
This lemma resembles the Szemerédi regularity lemma, but regularises all pairs of cells (not just most pairs), and the regularity is of polynomial strength in , rather than inverse tower exponential in the number of cells. Also, the cells are not arbitrary subsets of , but are themselves definable with bounded complexity, which turns out to be crucial for applications. I am optimistic that this lemma will be useful not just for studying expanding polynomials, but for many other combinatorial questions involving dense subsets of definable sets over finite fields.
The above lemma is stated for graphs , but one can iterate it to obtain an analogous regularisation of hypergraphs for any bounded (for application to (2), we need ). This hypergraph regularity lemma, by the way, is not analogous to the strong hypergraph regularity lemmas of Rodl et al. and Gowers developed in the last six or so years, but closer in spirit to the older (but weaker) hypergraph regularity lemma of Chung which gives the same “order ” regularity that the graph regularity lemma gives, rather than higher order regularity.
One feature of the proof of Lemma 2 which I found striking was the need to use some fairly high powered technology from algebraic geometry, and in particular the Lang-Weil bound on counting points in varieties over a finite field (discussed in this previous blog post), and also the theory of the etale fundamental group. Let me try to briefly explain why this is the case. A model example of a definable set of bounded complexity is a set of the form
for some polynomial . (Actually, it turns out that one can essentially write all definable sets as an intersection of sets of this form; see this previous blog post for more discussion.) To regularise the set , it is convenient to square the adjacency matrix, which soon leads to the study of counting functions such as
If one can show that this function is “approximately finite rank” in the sense that (modulo lower order errors, of size smaller than the main term), this quantity depends only on a bounded number of bits of information about and a bounded number of bits of information about , then a little bit of linear algebra will then give the required regularity result.
One can recognise as counting -points of a certain algebraic variety
The Lang-Weil bound (discussed in this previous post) provides a formula for this count, in terms of the number of geometrically irreducible components of that are defined over (or equivalently, are invariant with respect to the Frobenius endomorphism associated to ). So the problem boils down to ensuring that this quantity is “generically bounded rank”, in the sense that for generic , its value depends only on a bounded number of bits of and a bounded number of bits of .
Here is where the étale fundamental group comes in. One can view as a fibre product of the varieties
over . If one is in sufficiently high characteristic (or even better, in zero characteristic, which one can reduce to by an ultraproduct (or nonstandard analysis) construction, similar to that discussed in this previous post), the varieties are generically finite étale covers of , and the fibre product is then also generically a finite étale cover. One can count the components of a finite étale cover of a connected variety by counting the number of orbits of the étale fundamental group acting on a fibre of that variety (much as the number of components of a cover of a connected manifold is the number of orbits of the topological fundamental group acting on that fibre). So if one understands the étale fundamental group of a certain generic subset of (formed by intersecting together an -dependent generic subset of with an -dependent generic subset), this in principle controls . It turns out that one can decouple the and dependence of this fundamental group by using an étale version of the van Kampen theorem for the fundamental group, which I discussed in this previous blog post. With this fact (and another deep fact about the étale fundamental group in zero characteristic, namely that it is topologically finitely generated), one can obtain the desired generic bounded rank property of , which gives the regularity lemma.
In order to expedite the deployment of all this algebraic geometry (as well as some Riemann surface theory), it is convenient to use the formalism of nonstandard analysis (or the ultraproduct construction), which among other things can convert quantitative, finitary problems in large characteristic into equivalent qualitative, infinitary problems in zero characteristic (in the spirit of this blog post). This allows one to use several tools from those fields as “black boxes”; not just the theory of étale fundamental groups (which are considerably simpler and more favorable in characteristic zero than they are in positive characteristic), but also some results limiting the morphisms between compact Riemann surfaces of high genus (such as the de Franchis theorem, the Riemann-Hurwitz formula, or the fact that all morphisms between elliptic curves are essentially group homomorphisms), which would be quite unwieldy to utilise if one did not first pass to the zero characteristic case (and thence to the complex case) via the ultraproduct construction (followed by the Lefschetz principle).
I found this project to be particularly educational for me, as it forced me to wander outside of my usual range by quite a bit in order to pick up the tools from algebraic geometry and Riemann surfaces that I needed (in particular, I read through several chapters of EGA and SGA for the first time). This did however put me in the slightly unnerving position of having to use results (such as the Riemann existence theorem) whose proofs I have not fully verified for myself, but which are easy to find in the literature, and widely accepted in the field. I suppose this type of dependence on results in the literature is more common in the more structured fields of mathematics than it is in analysis, which by its nature has fewer reusable black boxes, and so key tools often need to be rederived and modified for each new application. (This distinction is discussed further in this article of Gowers.)
Let be a large natural number, and let be a matrix drawn from the Gaussian Unitary Ensemble (GUE), by which we mean that is a Hermitian matrix whose upper triangular entries are iid complex gaussians with mean zero and variance one, and whose diagonal entries are iid real gaussians with mean zero and variance one (and independent of the upper triangular entries). The eigenvalues are then real and almost surely distinct, and can be viewed as a random point process on the real line. One can then form the -point correlation functions for every , which can be defined by duality by requiring
for any test function . For GUE, which is a continuous matrix ensemble, one can also define for distinct as the unique quantity such that the probability that there is an eigenvalue in each of the intervals is in the limit .
As is well known, the GUE process is a determinantal point process, which means that -point correlation functions can be explicitly computed as
for some kernel ; explicitly, one has
Using the asymptotics of Hermite polynomials (which then give asymptotics for the kernel ), one can take a limit of a (suitably rescaled) sequence of GUE processes to obtain the Dyson sine process, which is a determinantal point process on the real line with correlation functions
A bit more precisely, for any fixed bulk energy , the renormalised point processes converge in distribution in the vague topology to as , where is the semi-circular law density.
On the other hand, an important feature of the GUE process is its stationarity (modulo rescaling) under Dyson Brownian motion
which describes the stochastic evolution of eigenvalues of a Hermitian matrix under independent Brownian motion of its entries, and is discussed in this previous blog post. To cut a long story short, this stationarity tells us that the self-similar -point correlation function
obeys the Dyson heat equation
(see Exercise 11 of the previously mentioned blog post). Note that vanishes to second order whenever two of the coincide, so there is no singularity on the right-hand side. Setting and using self-similarity, we can rewrite this equation in time-independent form as
One can then integrate out all but of these variables (after carefully justifying convergence) to obtain a system of equations for the -point correlation functions :
where the integral is interpreted in the principal value case. This system is an example of a BBGKY hierarchy.
If one carefully rescales and takes limits (say at the energy level , for simplicity), the left-hand side turns out to rescale to be a lower order term, and one ends up with a hierarchy for the Dyson sine process:
Informally, these equations show that the Dyson sine process is stationary with respect to the infinite Dyson Brownian motion
where are independent Brownian increments, and the sum is interpreted in a suitable principal value sense.
I recently set myself the exercise of deriving the identity (3) directly from the definition (1) of the Dyson sine process, without reference to GUE. This turns out to not be too difficult when done the right way (namely, by modifying the proof of Gaudin’s lemma), although it did take me an entire day of work before I realised this, and I could not find it in the literature (though I suspect that many people in the field have privately performed this exercise in the past). In any case, I am recording the computation here, largely because I really don’t want to have to do it again, but perhaps it will also be of interest to some readers.