You are currently browsing the tag archive for the ‘polynomials’ tag.
The equidistribution theorem asserts that if is an irrational phase, then the sequence is equidistributed on the unit circle, or equivalently that
for any continuous (or equivalently, for any smooth) function . By approximating uniformly by a Fourier series, this claim is equivalent to that of showing that
for any non-zero integer (where ), which is easily verified from the irrationality of and the geometric series formula. Conversely, if is rational, then clearly fails to go to zero when is a multiple of the denominator of .
One can then ask for more quantitative information about the decay of exponential sums of , or more generally on exponential sums of the form for an arithmetic progression (in this post all progressions are understood to be finite) and a polynomial . It will be convenient to phrase such information in the form of an inverse theorem, describing those phases for which the exponential sum is large. Indeed, we have
for some , then there exists a subprogression of of size such that varies by at most on (that is to say, lies in a subinterval of of length at most ).
Proof: By a linear change of variable we may assume that is of the form for some . We may of course assume that is non-zero in , so that ( denotes the distance from to the nearest integer). From the geometric series formula we see that
and so . Setting for some sufficiently small absolute constant , we obtain the claim.
Thus, in order for a linear phase to fail to be equidistributed on some long progression , must in fact be almost constant on large piece of .
As is well known, this phenomenon generalises to higher order polynomials. To achieve this, we need two elementary additional lemmas. The first relates the exponential sums of to the exponential sums of its “first derivatives” .
Proof: Squaring (1), we see that
We write and conclude that
where is a subprogression of of the same spacing. Since , we conclude that
for values of (this can be seen, much like the pigeonhole principle, by arguing via contradiction for a suitable choice of implied constants). The claim follows.
The second lemma (which we recycle from this previous blog post) is a variant of the equidistribution theorem.
or else there is a natural number such that
Proof: We may assume that and , since we are done otherwise. Then there are at least two with , and by the pigeonhole principle we can find in with and . By the triangle inequality, we conclude that there exists at least one natural number for which
If then we are done, so suppose that . Suppose that are elements of such that and . Writing for some , we have
By hypothesis, ; note that as and we also have . This implies that and thus . We then have
We conclude that for fixed with , there are at most elements of such that . Iterating this with a greedy algorithm, we see that the number of with is at most ; since , this implies that
and the claim follows.
Now we can quickly obtain a higher degree version of Lemma 1:
for some , then there exists a subprogression of with such that varies by at most on .
Proof: We induct on . The cases are immediate from Lemma 1. Now suppose that , and that the claim had already been proven for . To simplify the notation we allow implied constants to depend on . Let the hypotheses be as in the proposition. Clearly cannot exceed . By shrinking as necessary we may assume that for some sufficiently small constant depending on .
By rescaling we may assume . By Lemma 3, we see that for choices of such that
for some interval . We write , then is a polynomial of degree at most with leading coefficient . We conclude from induction hypothesis that for each such , there exists a natural number such that , by double-counting, this implies that there are integers in the interval such that . Applying Lemma 3, we conclude that either , or that
We partition into arithmetic progressions of spacing and length comparable to for some large depending on to be chosen later. By hypothesis, we have
so by the pigeonhole principle, we have
for at least one such progression . On this progression, we may use the binomial theorem and (4) to write as a polynomial in of degree at most , plus an error of size . We thus can write for for some polynomial of degree at most . By the triangle inequality, we thus have (for large enough) that
and hence by induction hypothesis we may find a subprogression of of size such that varies by most on , and thus (for large enough again) that varies by at most on , and the claim follows.
This gives the following corollary (also given as Exercise 16 in this previous blog post):
for some , then there is a natural number such that for all .
One can obtain much better exponents here using Vinogradov’s mean value theorem; see Theorem 1.6 this paper of Wooley. (Thanks to Mariusz Mirek for this reference.) However, this weaker result already suffices for many applications, and does not need any result as deep as the mean value theorem.
Proof: To simplify notation we allow implied constants to depend on . As before, we may assume that for some small constant depending only on . We may also assume that for some large , as the claim is trivial otherwise (set ).
Applying Proposition 4, we can find a natural number and an arithmetic subprogression of such that and such that varies by at most on . Writing for some interval of length and some , we conclude that the polynomial varies by at most on . Taking order differences, we conclude that the coefficient of this polynomial is ; by the binomial theorem, this implies that differs by at most on from a polynomial of degree at most . Iterating this, we conclude that the coefficient of is for , and the claim then follows by inverting the change of variables (and replacing with a larger quantity such as as necessary).
For future reference we also record a higher degree version of the Vinogradov lemma.
for all .
Proof: We induct on . For this follows from Lemma 3 (noting that if then ), so suppose that and that the claim is already proven for . We now allow all implied constants to depend on .
For each , let denote the number of such that . By hypothesis, , and clearly , so we must have for choices of . For each such , we then have for choices of , so by induction hypothesis, either (5) or (6) holds, or else for choices of , there is a natural number such that
for , where are the coefficients of the degree polynomial . We may of course assume it is the latter which holds. By the pigeonhole principle we may take to be independent of .
Since , we have
We can again assume it is the latter that holds. This implies that modulo , so that
for choices of . Arguing as before and iterating, we obtain the claim.
The above results also extend to higher dimensions. Here is the higher dimensional version of Proposition 4:
Proposition 7 (Multidimensional Weyl exponential sum estimate, inverse form) Let and , and let be arithmetic progressions of length at most for each . Let be a polynomial of degrees at most in each of the variables separately. If
for some , then there exists a subprogression of with for each such that varies by at most on .
A much more general statement, in which the polynomial phase is replaced by a nilsequence, and in which one does not necessarily assume the exponential sum is small, is given in Theorem 8.6 of this paper of Ben Green and myself, but it involves far more notation to even state properly.
Proof: We induct on . The case was established in Proposition 5, so we assume that and that the claim has already been proven for . To simplify notation we allow all implied constants to depend on . We may assume that for some small depending only on .
By a linear change of variables, we may assume that for all .
We write . First suppose that . Then by the pigeonhole principle we can find such that
and the claim then follows from the induction hypothesis. Thus we may assume that for some large depending only on . Similarly we may assume that for all .
By the triangle inequality, we have
for (the claim also holds for but we discard it as being trivial). By the pigeonhole principle, there thus exists a natural number such that
for all and for choices of . If we write
where is a polynomial of degrees at most , then for choices of we then have
Applying Lemma 6 in the and the largeness hypotheses on the (and also the assumption that ) we conclude (after enlarging as necessary, and pigeonholing to keep independent of ) that
whenever for , with nonzero. Permuting the indices, and observing that the claim is trivial for , we in fact obtain (8) for all , at which point the claim easily follows by taking for each .
An inspection of the proof of the above result (or alternatively, by combining the above result again with many applications of Lemma 6) reveals the following general form of Proposition 4, which was posed as Exercise 17 in this previous blog post, but had a slight misprint in it (it did not properly treat the possibility that some of the could be small) and was a bit trickier to prove than anticipated (in fact, the reason for this post was that I was asked to supply a more detailed solution for this exercise):
Proposition 8 (Multidimensional Weyl exponential sum estimate, inverse form, II) Let be an natural number, and for each , let be a discrete interval for some . Let
Again, the factor of is natural in this bound. In the case, the option (10) may be deleted since (11) trivially holds in this case, but this simplification is no longer available for since one needs (10) to hold for all (not just one ) to make (11) completely trivial. Indeed, the above proposition fails for if one removes (10) completely, as can be seen for instance by inspecting the exponential sum , which has size comparable to regardless of how irrational is.
Vitaly Bergelson, Tamar Ziegler, and I have just uploaded to the arXiv our joint paper “Multiple recurrence and convergence results associated to -actions“. This paper is primarily concerned with limit formulae in the theory of multiple recurrence in ergodic theory. Perhaps the most basic formula of this type is the mean ergodic theorem, which (among other things) asserts that if is a measure-preserving -system (which, in this post, means that is a probability space and is measure-preserving and invertible, thus giving an action of the integers), and are functions, and is ergodic (which means that contains no -invariant functions other than the constants (up to almost everywhere equivalence, of course)), then the average
converges as to the expression
see e.g. this previous blog post. Informally, one can interpret this limit formula as an equidistribution result: if is drawn at random from (using the probability measure ), and is drawn at random from for some large , then the pair becomes uniformly distributed in the product space (using product measure ) in the limit as .
If we allow to be non-ergodic, then we still have a limit formula, but it is a bit more complicated. Let be the -invariant measurable sets in ; the -system can then be viewed as a factor of the original system , which is equivalent (in the sense of measure-preserving systems) to a trivial system (known as the invariant factor) in which the shift is trivial. There is then a projection map to the invariant factor which is a factor map, and the average (1) converges in the limit to the expression
where is the pushforward map associated to the map ; see e.g. this previous blog post. We can interpret this as an equidistribution result. If is a pair as before, then we no longer expect complete equidistribution in in the non-ergodic, because there are now non-trivial constraints relating with ; indeed, for any -invariant function , we have the constraint ; putting all these constraints together we see that (for almost every , at least). The limit (2) can be viewed as an assertion that this constraint are in some sense the “only” constraints between and , and that the pair is uniformly distributed relative to these constraints.
for three functions ; this is analogous to the combinatorial task of counting length three progressions in various sets. For simplicity we assume the system to be ergodic. Naively one might expect this limit to then converge to
which would roughly speaking correspond to an assertion that the triplet is asymptotically equidistributed in . However, even in the ergodic case there can be additional constraints on this triplet that cannot be seen at the level of the individual pairs , . The key obstruction here is that of eigenfunctions of the shift , that is to say non-trivial functions that obey the eigenfunction equation almost everywhere for some constant (or -invariant) . Each such eigenfunction generates a constraint
tying together , , and . However, it turns out that these are in some sense the only constraints on that are relevant for the limit (3). More precisely, if one sets to be the sub-algebra of generated by the eigenfunctions of , then it turns out that the factor is isomorphic to a shift system known as the Kronecker factor, for some compact abelian group and some (irrational) shift ; the factor map pushes eigenfunctions forward to (affine) characters on . It is then known that the limit of (3) is
where is the closed subgroup
and is the Haar probability measure on ; see this previous blog post. The equation defining corresponds to the constraint (4) mentioned earlier. Among other things, this limit formula implies Roth’s theorem, which in the context of ergodic theory is the assertion that the limit (or at least the limit inferior) of (3) is positive when is non-negative and not identically vanishing.
(analogous to counting length four progressions) then the situation becomes more complicated still, even in the ergodic case. In addition to the (linear) eigenfunctions that already showed up in the computation of the triple average (3), a new type of constraint also arises from quadratic eigenfunctions , which obey an eigenfunction equation in which is no longer constant, but is now a linear eigenfunction. For such functions, behaves quadratically in , and one can compute the existence of a constraint
between , , , and that is not detected at the triple average level. As it turns out, this is not the only type of constraint relevant for (5); there is a more general class of constraint involving two-step nilsystems which we will not detail here, but see e.g. this previous blog post for more discussion. Nevertheless there is still a similar limit formula to previous examples, involving a special factor which turns out to be an inverse limit of two-step nilsystems; this limit theorem can be extracted from the structural theory in this paper of Host and Kra combined with a limit formula for nilsystems obtained by Lesigne, but will not be reproduced here. The pattern continues to higher averages (and higher step nilsystems); this was first done explicitly by Ziegler, and can also in principle be extracted from the structural theory of Host-Kra combined with nilsystem equidistribution results of Leibman. These sorts of limit formulae can lead to various recurrence results refining Roth’s theorem in various ways; see this paper of Bergelson, Host, and Kra for some examples of this.
The above discussion was concerned with -systems, but one can adapt much of the theory to measure-preserving -systems for other discrete countable abelian groups , in which one now has a family of shifts indexed by rather than a single shift, obeying the compatibility relation . The role of the intervals in this more general setting is replaced by that of Folner sequences. For arbitrary countable abelian , the theory for double averages (1) and triple limits (3) is essentially identical to the -system case. But when one turns to quadruple and higher limits, the situation becomes more complicated (and, for arbitrary , still not fully understood). However one model case which is now well understood is the finite field case when is an infinite-dimensional vector space over a finite field (with the finite subspaces then being a good choice for the Folner sequence). Here, the analogue of the structural theory of Host and Kra was worked out by Vitaly, Tamar, and myself in these previous papers (treating the high characteristic and low characteristic cases respectively). In the finite field setting, it turns out that nilsystems no longer appear, and one only needs to deal with linear, quadratic, and higher order eigenfunctions (known collectively as phase polynomials). It is then natural to look for a limit formula that asserts, roughly speaking, that if is drawn at random from a -system and drawn randomly from a large subspace of , then the only constraints between are those that arise from phase polynomials. The main theorem of this paper is to establish this limit formula (which, again, is a little complicated to state explicitly and will not be done here). In particular, we establish for the first time that the limit actually exists (a result which, for -systems, was one of the main results of this paper of Host and Kra).
As a consequence, we can recover finite field analogues of most of the results of Bergelson-Host-Kra, though interestingly some of the counterexamples demonstrating sharpness of their results for -systems (based on Behrend set constructions) do not seem to be present in the finite field setting (cf. this previous blog post on the cap set problem). In particular, we are able to largely settle the question of when one has a Khintchine-type theorem that asserts that for any measurable set in an ergodic -system and any , one has
for a syndetic set of , where are distinct residue classes. It turns out that Khintchine-type theorems always hold for (and for ergodicity is not required), and for it holds whenever form a parallelogram, but not otherwise (though the counterexample here was such a painful computation that we ended up removing it from the paper, and may end up putting it online somewhere instead), and for larger we could show that the Khintchine property failed for generic choices of , though the problem of determining exactly the tuples for which the Khintchine property failed looked to be rather messy and we did not completely settle it.
I’ve just uploaded to the arXiv my paper “Expanding polynomials over finite fields of large characteristic, and a regularity lemma for definable sets“, submitted to Contrib. Disc. Math. The motivation of this paper is to understand a certain polynomial variant of the sum-product phenomenon in finite fields. This phenomenon asserts that if is a non-empty subset of a finite field , then either the sumset or product set will be significantly larger than , unless is close to a subfield of (or to ). In particular, in the regime when is large, say , one expects an expansion bound of the form
for some absolute constants . Results of this type are known; for instance, Hart, Iosevich, and Solymosi obtained precisely this bound for (in the case when is prime), which was then improved by Garaev to .
We have focused here on the case when is a large subset of , but sum-product estimates are also extremely interesting in the opposite regime in which is allowed to be small (see for instance the papers of Katz–Shen and Li and of Garaev for some recent work in this case, building on some older papers of Bourgain, Katz and myself and of Bourgain, Glibichuk, and Konyagin). However, the techniques used in these two regimes are rather different. For large subsets of , it is often profitable to use techniques such as the Fourier transform or the Cauchy-Schwarz inequality to “complete” a sum over a large set (such as ) into a set over the entire field , and then to use identities concerning complete sums (such as the Weil bound on complete exponential sums over a finite field). For small subsets of , such techniques are usually quite inefficient, and one has to proceed by somewhat different combinatorial methods which do not try to exploit the ambient field . But my paper focuses exclusively on the large regime, and unfortunately does not directly say much (except through reasoning by analogy) about the small case.
Note that it is necessary to have both and appear on the left-hand side of (1). Indeed, if one just has the sumset , then one can set to be a long arithmetic progression to give counterexamples to (1). Similarly, if one just has a product set , then one can set to be a long geometric progression. The sum-product phenomenon can then be viewed that it is not possible to simultaneously behave like a long arithmetic progression and a long geometric progression, unless one is already very close to behaving like a subfield.
Now we consider a polynomial variant of the sum-product phenomenon, where we consider a polynomial image
of a set with respect to a polynomial ; we can also consider the asymmetric setting of the image
of two subsets . The regime we will be interested is the one where the field is large, and the subsets of are also large, but the polynomial has bounded degree. Actually, for technical reasons it will not be enough for us to assume that has large cardinality; we will also need to assume that has large characteristic. (The two concepts are synonymous for fields of prime order, but not in general; for instance, the field with elements becomes large as while the characteristic remains fixed at , and is thus not going to be covered by the results in this paper.)
whenever for some absolute constants , unless the polynomial had the degenerate form for some linear function and polynomial , in which behaves too much like to get reasonable expansion. In this paper, we focus instead on the question of bounding alone. In particular, one can ask to classify the polynomials for which one has the weak expansion property
whenever for some absolute constants . One can also ask for stronger versions of this expander property, such as the moderate expansion property
whenever , or the almost strong expansion property
whenever . (One can consider even stronger expansion properties, such as the strong expansion property , but it was shown by Gyarmati and Sarkozy that this property cannot hold for polynomials of two variables of bounded degree when .) One can also consider asymmetric versions of these properties, in which one obtains lower bounds on rather than .
The example of a long arithmetic or geometric progression shows that the polynomials or cannot be expanders in any of the above senses, and a similar construction also shows that polynomials of the form or for some polynomials cannot be expanders. On the other hand, there are a number of results in the literature establishing expansion for various polynomials in two or more variables that are not of this degenerate form (in part because such results are related to incidence geometry questions in finite fields, such as the finite field version of the Erdos distinct distances problem). For instance, Solymosi established weak expansion for polynomials of the form when is a nonlinear polynomial, with generalisations by Hart, Li, and Shen for various polynomials of the form or . Further examples of expanding polynomials appear in the work of Shkredov, Iosevich-Rudnev, and Bukh-Tsimerman, as well as the previously mentioned paper of Vu and of Hart-Li-Shen, and these papers in turn cite many further results which are in the spirit of the polynomial expansion bounds discussed here (for instance, dealing with the small regime, or working in other fields such as instead of in finite fields ). We will not summarise all these results here; they are summarised briefly in my paper, and in more detail in the papers of Hart-Li-Shen and of Bukh-Tsimerman. But we will single out one of the results of Bukh-Tsimerman, which is one of most recent and general of these results, and closest to the results of my own paper. Roughly speaking, in this paper it is shown that a polynomial of two variables and bounded degree will be a moderate expander if it is non-composite (in the sense that it does not take the form for some non-linear polynomial and some polynomial , possibly having coefficients in the algebraic completion of ) and is monic on both and , thus it takes the form for some and some polynomial of degree at most in , and similarly with the roles of and reversed, unless is of the form or (in which case the expansion theory is covered to a large extent by the previous work of Hart, Li, and Shen).
Our first main result improves upon the Bukh-Tsimerman result by strengthening the notion of expansion and removing the non-composite and monic hypotheses, but imposes a condition of large characteristic. I’ll state the result here slightly informally as follows:
Theorem 1 (Criterion for moderate expansion) Let be a polynomial of bounded degree over a finite field of sufficiently large characteristic, and suppose that is not of the form or for some polynomials . Then one has the (asymmetric) moderate expansion property
This is basically a sharp necessary and sufficient condition for asymmetric expansion moderate for polynomials of two variables. In the paper, analogous sufficient conditions for weak or almost strong expansion are also given, although these are not quite as satisfactory (particularly the conditions for almost strong expansion, which include a somewhat complicated algebraic condition which is not easy to check, and which I would like to simplify further, but was unable to).
The argument here resembles the Bukh-Tsimerman argument in many ways. One can view the result as an assertion about the expansion properties of the graph , which can essentially be thought of as a somewhat sparse three-uniform hypergraph on . Being sparse, it is difficult to directly apply techniques from dense graph or hypergraph theory for this situation; however, after a few applications of the Cauchy-Schwarz inequality, it turns out (as observed by Bukh and Tsimerman) that one can essentially convert the problem to one about the expansion properties of the set
(actually, one should view this as a multiset, but let us ignore this technicality) which one expects to be a dense set in , except in the case when the associated algebraic variety
fails to be Zariski dense, but it turns out that in this case one can use some differential geometry and Riemann surface arguments (after first invoking the Lefschetz principle and the high characteristic hypothesis to work over the complex numbers instead over a finite field) to show that is of the form or . This reduction is related to the classical fact that the only one-dimensional algebraic groups over the complex numbers are the additive group , the multiplicative group , or the elliptic curves (but the latter have a group law given by rational functions rather than polynomials, and so ultimately end up being eliminated from consideration, though they would play an important role if one wanted to study the expansion properties of rational functions).
It remains to understand the structure of the set (2) is. To understand dense graphs or hypergraphs, one of the standard tools of choice is the Szemerédi regularity lemma, which carves up such graphs into a bounded number of cells, with the graph behaving pseudorandomly on most pairs of cells. However, the bounds in this lemma are notoriously poor (the regularity obtained is an inverse tower exponential function of the number of cells), and this makes this lemma unsuitable for the type of expansion properties we seek (in which we want to deal with sets which have a polynomial sparsity, e.g. ). Fortunately, in the case of sets such as (2) which are definable over the language of rings, it turns out that a much stronger regularity lemma is available, which I call the “algebraic regularity lemma”. I’ll state it (again, slightly informally) in the context of graphs as follows:
Lemma 2 (Algebraic regularity lemma) Let be a finite field of large characteristic, and let be definable sets over of bounded complexity (i.e. are subsets of , for some bounded that can be described by some first-order predicate in the language of rings of bounded length and involving boundedly many constants). Let be a definable subset of , again of bounded complexity (one can view as a bipartite graph connecting and ). Then one can partition into a bounded number of cells , , still definable with bounded complexity, such that for all pairs , , one has the regularity property
for all , where is the density of in .
This lemma resembles the Szemerédi regularity lemma, but regularises all pairs of cells (not just most pairs), and the regularity is of polynomial strength in , rather than inverse tower exponential in the number of cells. Also, the cells are not arbitrary subsets of , but are themselves definable with bounded complexity, which turns out to be crucial for applications. I am optimistic that this lemma will be useful not just for studying expanding polynomials, but for many other combinatorial questions involving dense subsets of definable sets over finite fields.
The above lemma is stated for graphs , but one can iterate it to obtain an analogous regularisation of hypergraphs for any bounded (for application to (2), we need ). This hypergraph regularity lemma, by the way, is not analogous to the strong hypergraph regularity lemmas of Rodl et al. and Gowers developed in the last six or so years, but closer in spirit to the older (but weaker) hypergraph regularity lemma of Chung which gives the same “order ” regularity that the graph regularity lemma gives, rather than higher order regularity.
One feature of the proof of Lemma 2 which I found striking was the need to use some fairly high powered technology from algebraic geometry, and in particular the Lang-Weil bound on counting points in varieties over a finite field (discussed in this previous blog post), and also the theory of the etale fundamental group. Let me try to briefly explain why this is the case. A model example of a definable set of bounded complexity is a set of the form
for some polynomial . (Actually, it turns out that one can essentially write all definable sets as an intersection of sets of this form; see this previous blog post for more discussion.) To regularise the set , it is convenient to square the adjacency matrix, which soon leads to the study of counting functions such as
If one can show that this function is “approximately finite rank” in the sense that (modulo lower order errors, of size smaller than the main term), this quantity depends only on a bounded number of bits of information about and a bounded number of bits of information about , then a little bit of linear algebra will then give the required regularity result.
One can recognise as counting -points of a certain algebraic variety
The Lang-Weil bound (discussed in this previous post) provides a formula for this count, in terms of the number of geometrically irreducible components of that are defined over (or equivalently, are invariant with respect to the Frobenius endomorphism associated to ). So the problem boils down to ensuring that this quantity is “generically bounded rank”, in the sense that for generic , its value depends only on a bounded number of bits of and a bounded number of bits of .
Here is where the étale fundamental group comes in. One can view as a fibre product of the varieties
over . If one is in sufficiently high characteristic (or even better, in zero characteristic, which one can reduce to by an ultraproduct (or nonstandard analysis) construction, similar to that discussed in this previous post), the varieties are generically finite étale covers of , and the fibre product is then also generically a finite étale cover. One can count the components of a finite étale cover of a connected variety by counting the number of orbits of the étale fundamental group acting on a fibre of that variety (much as the number of components of a cover of a connected manifold is the number of orbits of the topological fundamental group acting on that fibre). So if one understands the étale fundamental group of a certain generic subset of (formed by intersecting together an -dependent generic subset of with an -dependent generic subset), this in principle controls . It turns out that one can decouple the and dependence of this fundamental group by using an étale version of the van Kampen theorem for the fundamental group, which I discussed in this previous blog post. With this fact (and another deep fact about the étale fundamental group in zero characteristic, namely that it is topologically finitely generated), one can obtain the desired generic bounded rank property of , which gives the regularity lemma.
In order to expedite the deployment of all this algebraic geometry (as well as some Riemann surface theory), it is convenient to use the formalism of nonstandard analysis (or the ultraproduct construction), which among other things can convert quantitative, finitary problems in large characteristic into equivalent qualitative, infinitary problems in zero characteristic (in the spirit of this blog post). This allows one to use several tools from those fields as “black boxes”; not just the theory of étale fundamental groups (which are considerably simpler and more favorable in characteristic zero than they are in positive characteristic), but also some results limiting the morphisms between compact Riemann surfaces of high genus (such as the de Franchis theorem, the Riemann-Hurwitz formula, or the fact that all morphisms between elliptic curves are essentially group homomorphisms), which would be quite unwieldy to utilise if one did not first pass to the zero characteristic case (and thence to the complex case) via the ultraproduct construction (followed by the Lefschetz principle).
I found this project to be particularly educational for me, as it forced me to wander outside of my usual range by quite a bit in order to pick up the tools from algebraic geometry and Riemann surfaces that I needed (in particular, I read through several chapters of EGA and SGA for the first time). This did however put me in the slightly unnerving position of having to use results (such as the Riemann existence theorem) whose proofs I have not fully verified for myself, but which are easy to find in the literature, and widely accepted in the field. I suppose this type of dependence on results in the literature is more common in the more structured fields of mathematics than it is in analysis, which by its nature has fewer reusable black boxes, and so key tools often need to be rederived and modified for each new application. (This distinction is discussed further in this article of Gowers.)
One of the first non-trivial theorems one encounters in classical algebraic geometry is Bézout’s theorem, which we will phrase as follows:
Theorem 1 (Bézout’s theorem) Let be a field, and let be non-zero polynomials in two variables with no common factor. Then the two curves and have no common components, and intersect in at most points.
This theorem can be proven by a number of means, for instance by using the classical tool of resultants. It has many strengthenings, generalisations, and variants; see for instance this previous blog post on Bézout’s inequality. Bézout’s theorem asserts a fundamental algebraic dichotomy, of importance in combinatorial incidence geometry: any two algebraic curves either share a common component, or else have a bounded finite intersection; there is no intermediate case in which the intersection is unbounded in cardinality, but falls short of a common component. This dichotomy is closely related to the integrality gap in algebraic dimension: an algebraic set can have an integer dimension such as or , but cannot attain any intermediate dimensions such as . This stands in marked contrast to sets of analytic, combinatorial, or probabilistic origin, whose “dimension” is typically not necessarily constrained to be an integer.
Bézout’s inequality tells us, roughly speaking, that the intersection of a curve of degree and a curve of degree forms a set of at most points. One can consider the converse question: given a set of points in the plane , can one find two curves of degrees with and no common components, whose intersection contains these points?
A model example that supports the possibility of such a converse is a grid that is a Cartesian product of two finite subsets of with . In this case, one can take one curve to be the union of vertical lines, and the other curve to be the union of horizontal lines, to obtain the required decomposition. Thus, if the proposed converse to Bézout’s inequality held, it would assert that any set of points was essentially behaving like a “nonlinear grid” of size .
Unfortunately, the naive converse to Bézout’s theorem is false. A counterexample can be given by considering a set of points for some large perfect square , where is a by grid of the form described above, and consists of points on an line (e.g. a or grid). Each of the two component sets can be written as the intersection between two curves whose degrees multiply up to ; in the case of , we can take the two families of parallel lines (viewed as reducible curves of degree ) as the curves, and in the case of , one can take as one curve, and the graph of a degree polynomial on vanishing on for the second curve. But, if is large enough, one cannot cover by the intersection of a single pair of curves with no common components whose degrees multiply up to . Indeed, if this were the case, then without loss of generality we may assume that , so that . By Bézout’s theorem, either contains , or intersects in at most points. Thus, in order for to capture all of , it must contain , which forces to not contain . But has to intersect in points, so by Bézout’s theorem again we have , thus . But then (by more applications of Bézout’s theorem) can only capture of the points of , a contradiction.
But the above counterexample suggests that even if an arbitrary set of (or ) points cannot be covered by the single intersection of a pair of curves with degree multiplying up to , one may be able to cover such a set by a small number of such intersections. The purpose of this post is to record the simple observation that this is, indeed, the case:
Informally, every finite set in the plane is (a dense subset of) the union of logarithmically many nonlinear grids. The presence of the logarithm is necessary, as can be seen by modifying the example to be the union of logarithmically many Cartesian products of distinct dimensions, rather than just a pair of such products.
Unfortunately I do not know of any application of this converse, but I thought it was cute anyways. The proof is given below the fold.
One of the basic problems in analytic number theory is to obtain bounds and asymptotics for sums of the form
in the limit , where ranges over natural numbers less than , and is some arithmetic function of number-theoretic interest. (It is also often convenient to replace this sharply truncated sum with a smoother sum such as , but we will not discuss this technicality here.) For instance, the prime number theorem is equivalent to the assertion
It is thus of interest to develop techniques to estimate such sums . Of course, the difficulty of this task depends on how “nice” the function is. The functions that come up in number theory lie on a broad spectrum of “niceness”, with some particularly nice functions being quite easy to sum, and some being insanely difficult.
At the easiest end of the spectrum are those functions that exhibit some sort of regularity or “smoothness”. Examples of smoothness include “Archimedean” smoothness, in which is the restriction of some smooth function from the reals to the natural numbers, and the derivatives of are well controlled. A typical example is
One can already get quite good bounds on this quantity by comparison with the integral , namely
with sharper bounds available by using tools such as the Euler-Maclaurin formula (see this blog post). Exponentiating such asymptotics, incidentally, leads to one of the standard proofs of Stirling’s formula (as discussed in this blog post).
One can also consider “non-Archimedean” notions of smoothness, such as periodicity relative to a small period . Indeed, if is periodic with period (and is thus essentially a function on the cyclic group ), then one has the easy bound
This is a good estimate when is much smaller than , but as approaches in magnitude, the error term begins to overwhelm the main term , and one needs much more delicate information on the fractional part of in order to obtain good estimates at this point.
One can also consider functions which combine “Archimedean” and “non-Archimedean” smoothness into an “adelic” smoothness. We will not define this term precisely here (though the concept of a Schwartz-Bruhat function is one way to capture this sort of concept), but a typical example might be
where is periodic with some small period . By using techniques such as summation by parts, one can estimate such sums using the techniques used to estimate sums of periodic functions or functions with (Archimedean) smoothness.
Another class of functions that is reasonably well controlled are the multiplicative functions, in which whenever are coprime. Here, one can use the powerful techniques of multiplicative number theory, for instance by working with the Dirichlet series
which are clearly related to the partial sums (essentially via the Mellin transform, a cousin of the Fourier and Laplace transforms); for this post we ignore the (important) issue of how to make sense of this series when it is not absolutely convergent (but see this previous blog post for more discussion). A primary reason that this technique is effective is that the Dirichlet series of a multiplicative function factorises as an Euler product
One also obtains similar types of representations for functions that are not quite multiplicative, but are closely related to multiplicative functions, such as the von Mangoldt function (whose Dirichlet series is not given by an Euler product, but instead by the logarithmic derivative of an Euler product).
Moving another notch along the spectrum between well-controlled and ill-controlled functions, one can consider functions that are divisor sums such as
for some other arithmetic function , and some level . This is a linear combination of periodic functions and is thus technically periodic in (with period equal to the least common multiple of all the numbers from to ), but in practice this periodic is far too large to be useful (except for extremely small levels , e.g. ). Nevertheless, we can still control the sum simply by rearranging the summation:
and thus by (1) one can bound this by the sum of a main term and an error term . As long as the level is significantly less than , one may expect the main term to dominate, and one can often estimate this term by a variety of techniques (for instance, if is multiplicative, then multiplicative number theory techniques are quite effective, as mentioned previously). Similarly for other slight variants of divisor sums, such as expressions of the form
or expressions of the form
where each is periodic with period .
One of the simplest examples of this comes when estimating the divisor function
which counts the number of divisors up to . This is a multiplicative function, and is therefore most efficiently estimated using the techniques of multiplicative number theory; but for reasons that will become clearer later, let us “forget” the multiplicative structure and estimate the above sum by more elementary methods. By applying the preceding method, we see that
Here, we are (barely) able to keep the error term smaller than the main term; this is right at the edge of the divisor sum method, because the level in this case is equal to . Unfortunately, at this high choice of level, it is not always possible to always keep the error term under control like this. For instance, if one wishes to use the standard divisor sum representation
where is the Möbius function, to compute , then one ends up looking at
From Dirichlet series methods, it is not difficult to establish the identities
in the sense of conditionally convergent series. Assuming one can justify this (which, ultimately, requires one to exclude zeroes of the Riemann zeta function on the line , as discussed in this previous post), one is eventually left with the estimate , which is useless as a lower bound (and recovers only the classical Chebyshev estimate as the upper bound). The inefficiency here when compared to the situation with the divisor function can be attributed to the signed nature of the Möbius function , which causes some cancellation in the divisor sum expansion that needs to be compensated for with improved estimates.
However, there are a number of tricks available to reduce the level of divisor sums. The simplest comes from exploiting the change of variables , which can in principle reduce the level by a square root. For instance, when computing the divisor function , one can observe using this change of variables that every divisor of above is paired with one below , and so we have
except when is a perfect square, in which case one must subtract one from the right-hand side. Using this reduced-level divisor sum representation, one can obtain an improvement to (2), namely
This type of argument is also known as the Dirichlet hyperbola method. A variant of this argument can also deduce the prime number theorem from (3), (4) (and with some additional effort, one can even drop the use of (4)); this is discussed at this previous blog post.
Using this square root trick, one can now also control divisor sums such as
(Note that has no multiplicativity properties in , and so multiplicative number theory techniques cannot be directly applied here.) The level of the divisor sum here is initially of order , which is too large to be useful; but using the square root trick, we can expand this expression as
which one can rewrite as
The constraint is periodic in with period , so we can write this as
where is the number of solutions in to the equation , and so
The function is multiplicative, and can be easily computed at primes and prime powers using tools such as quadratic reciprocity and Hensel’s lemma. For instance, by Fermat’s two-square theorem, is equal to for and for . From this and standard multiplicative number theory methods (e.g. by obtaining asymptotics on the Dirichlet series ), one eventually obtains the asymptotic
Similar arguments give asymptotics for on other quadratic polynomials; see for instance this paper of Hooley and these papers by McKee. Note that the irreducibility of the polynomial will be important. If one considers instead a sum involving a reducible polynomial, such as , then the analogous quantity becomes significantly larger, leading to a larger growth rate (of order rather than ) for the sum.
However, the square root trick is insufficient by itself to deal with higher order sums involving the divisor function, such as
the level here is initially of order , and the square root trick only lowers this to about , creating an error term that overwhelms the main term. And indeed, the asymptotic for such this sum has not yet been rigorously established (although if one heuristically drops error terms, one can arrive at a reasonable conjecture for this asymptotic), although some results are known if one averages over additional parameters (see e.g. this paper of Greaves, or this paper of Matthiesen).
Nevertheless, there is an ingenious argument of Erdös that allows one to obtain good upper and lower bounds for these sorts of sums, in particular establishing the asymptotic
for any fixed (not necessarily irreducible) and any fixed , due to van der Corput; this bound is in fact used to dispose of some error terms in the proof of (6). These should be compared with what one can obtain from the divisor bound and the trivial bound , giving the bounds
for any fixed .
for any , and the preceding methods then easily allow one to obtain the lower bound by taking small enough (more precisely, if has degree , one should take equal to or less). The upper bounds in (6) and (7) are more difficult. Ideally, if we could obtain upper bounds of the form
for any fixed , then the preceding methods would easily establish both results. Unfortunately, this bound can fail, as illustrated by the following example. Suppose that is the product of distinct primes , each of which is close to . Then has divisors, with of them close to for each . One can think of (the logarithms of) these divisors as being distributed according to what is essentially a Bernoulli distribution, thus a randomly selected divisor of has magnitude about , where is a random variable which has the same distribution as the number of heads in independently tossed fair coins. By the law of large numbers, should concentrate near when is large, which implies that the majority of the divisors of will be close to . Sending , one can show that the bound (8) fails whenever .
This however can be fixed in a number of ways. First of all, even when , one can show weaker substitutes for (8). For instance, for any fixed and one can show a bound of the form
For Erdös’s upper bound (6), though, one cannot afford to lose these additional factors of , and one must argue more carefully. Here, the key observation is that the counterexample discussed earlier – when the natural number is the product of a large number of fairly small primes – is quite atypical; most numbers have at least one large prime factor. For instance, the number of natural numbers less than that contain a prime factor between and is equal to
which, thanks to Mertens’ theorem
for some absolute constant , is comparable to . In a similar spirit, one can show by similarly elementary means that the number of natural numbers less than that are -smooth, in the sense that all prime factors are at most , is only about or so. Because of this, one can hope that the bound (8), while not true in full generality, will still be true for most natural numbers , with some slightly weaker substitute available (such as (7)) for the exceptional numbers . This turns out to be the case by an elementary but careful argument.
The Erdös argument is quite robust; for instance, the more general inequality
for fixed irreducible and , which improves van der Corput’s inequality (8) was shown by Delmer using the same methods. (A slight error in the original paper of Erdös was also corrected in this latter paper.) In a forthcoming revision to my paper on the Erdös-Straus conjecture, Christian Elsholtz and I have also applied this method to obtain bounds such as
which turn out to be enough to obtain the right asymptotics for the number of solutions to the equation .
Below the fold I will provide some more details of the arguments of Landreau and of Erdös.
I have blogged several times in the past about nonstandard analysis, which among other things is useful in allowing one to import tools from infinitary (or qualitative) mathematics in order to establish results in finitary (or quantitative) mathematics. One drawback, though, to using nonstandard analysis methods is that the bounds one obtains by such methods are usually ineffective: in particular, the conclusions of a nonstandard analysis argument may involve an unspecified constant that is known to be finite but for which no explicit bound is obviously available. (In many cases, a bound can eventually be worked out by performing proof mining on the argument, and in particular by carefully unpacking the proofs of all the various results from infinitary mathematics that were used in the argument, as opposed to simply using them as “black boxes”, but this is a time-consuming task and the bounds that one eventually obtains tend to be quite poor (e.g. tower exponential or Ackermann type bounds are not uncommon).)
Because of this fact, it would seem that quantitative bounds, such as polynomial type bounds that show that one quantity is controlled in a polynomial fashion by another quantity , are not easily obtainable through the ineffective methods of nonstandard analysis. Actually, this is not the case; as I will demonstrate by an example below, nonstandard analysis can certainly yield polynomial type bounds. The catch is that the exponent in such bounds will be ineffective; but nevertheless such bounds are still good enough for many applications.
Let us now illustrate this by reproving a lemma from this paper of Mei-Chu Chang (Lemma 2.14, to be precise), which was recently pointed out to me by Van Vu. Chang’s paper is focused primarily on the sum-product problem, but she uses a quantitative lemma from algebraic geometry which is of independent interest. To motivate the lemma, let us first establish a qualitative version:
then there also exists a solution whose coefficients are algebraic numbers (i.e. they lie in the algebraic closure of the rationals).
Proof: Suppose there was no solution to over . Applying Hilbert’s nullstellensatz (which is available as is algebraically closed), we conclude the existence of some polynomials (with coefficients in ) such that
as polynomials. In particular, we have
for all . This shows that there is no solution to over , as required.
The above lemma asserts that if a system of rational equations is solvable at all, then it is solvable with some algebraic solution. But it gives no bound on the complexity of that solution in terms of the complexity of the original equation. Chang’s lemma provides such a bound. If is an integer, let us say that an algebraic number has height at most if its minimal polynomial (after clearing denominators) consists of integers of magnitude at most .
Lemma 2 (Quantitative solvability) Let be a finite number of polynomials of degree at most with rational coefficients, each of height at most . If there is a complex solution to the simultaneous system of equations
then there also exists a solution whose coefficients are algebraic numbers of degree at most and height at most , where depends only on , and .
Chang proves this lemma by essentially establishing a quantitative version of the nullstellensatz, via elementary elimination theory (somewhat similar, actually, to the approach I took to the nullstellensatz in my own blog post). She also notes that one could also establish the result through the machinery of Gröbner bases. In each of these arguments, it was not possible to use Lemma 1 (or the closely related nullstellensatz) as a black box; one actually had to unpack one of the proofs of that lemma or nullstellensatz to get the polynomial bound. However, using nonstandard analysis, it is possible to get such polynomial bounds (albeit with an ineffective value of the constant ) directly from Lemma 1 (or more precisely, the generalisation in Remark 1) without having to inspect the proof, and instead simply using it as a black box, thus providing a “soft” proof of Lemma 2 that is an alternative to the “hard” proofs mentioned above.
Here’s how the proof works. Informally, the idea is that Lemma 2 should follow from Lemma 1 after replacing the field of rationals with “the field of rationals of polynomially bounded height”. Unfortunately, the latter object does not really make sense as a field in standard analysis; nevertheless, it is a perfectly sensible object in nonstandard analysis, and this allows the above informal argument to be made rigorous.
We turn to the details. As is common whenever one uses nonstandard analysis to prove finitary results, we use a “compactness and contradiction” argument (or more precisely, an “ultralimit and contradiction” argument). Suppose for contradiction that Lemma 2 failed. Carefully negating the quantifiers (and using the axiom of choice), we conclude that there exists such that for each natural number , there is a positive integer and a family of polynomials of degree at most and rational coefficients of height at most , such that there exist at least one complex solution to
but such that there does not exist any such solution whose coefficients are algebraic numbers of degree at most and height at most .
Now we take ultralimits (see e.g. this previous blog post of a quick review of ultralimit analysis, which we will assume knowledge of in the argument that follows). Let be a non-principal ultrafilter. For each , the ultralimit
of the (standard) polynomials is a nonstandard polynomial of degree at most , whose coefficients now lie in the nonstandard rationals . Actually, due to the height restriction, we can say more. Let be the ultralimit of the , this is a nonstandard natural number (which will almost certainly be unbounded, but we will not need to use this). Let us say that a nonstandard integer is of polynomial size if we have for some standard natural number , and say that a nonstandard rational number is of polynomial height if , are of polynomial size. Let be the collection of all nonstandard rationals of polynomial height. (In the language of nonstandard analysis, is an external set rather than an internal one, because it is not itself an ultraproduct of standard sets; but this will not be relevant for the argument that follows.) It is easy to see that is a field, basically because the sum or product of two integers of polynomial size, remains of polynomial size. By construction, it is clear that the coefficients of are nonstandard rationals of polynomial height, and thus are defined over .
Meanwhile, if we let be the ultralimit of the solutions in (1), we have
thus are solvable in . Applying Lemma 1 (or more precisely, the generalisation in Remark 1), we see that are also solvable in . (Note that as is algebraically closed, is also (by Los’s theorem), and so contains .) Thus, there exists with
As lies in , we can write as an ultralimit of standard complex vectors . By construction, the coefficients of each obey a non-trivial polynomial equation of degree at most and whose coefficients are nonstandard integers of magnitude at most , for some standard natural number . Undoing the ultralimit, we conclude that for sufficiently close to , the coefficients of obey a non-trivial polynomial equation of degree at most whose coefficients are standard integers of magnitude at most . In particular, these coefficients have height at most . Also, we have
But for larger than , this contradicts the construction of the , and the claim follows. (Note that as is non-principal, any neighbourhood of in will contain arbitrarily large natural numbers.)
Remark 2 The same argument actually gives a slightly stronger version of Lemma 2, namely that the integer coefficients used to define the algebraic solution can be taken to be polynomials in the coefficients of , with degree and coefficients bounded by .
Tamar Ziegler and I have just uploaded to the arXiv our paper “The inverse conjecture for the Gowers norm over finite fields in low characteristic“, submitted to Annals of Combinatorics. This paper completes another case of the inverse conjecture for the Gowers norm, this time for vector spaces over a fixed finite field of prime order; with Vitaly Bergelson, we had previously established this claim when the characteristic of the field was large, so the main new result here is the extension to the low characteristic case. (The case of a cyclic group or interval was established by Ben Green and ourselves in another recent paper. For an arbitrary abelian (or nilpotent) group, a general but less explicit description of the obstructions to Gowers uniformity was recently obtained by Szegedy; the latter result recovers the high-characteristic case of our result (as was done in a subsequent paper of Szegedy), as well as our results with Green, but it is not immediately evident whether Szegedy’s description of the obstructions matches up with the one predicted by the inverse conjecture in low characteristic.)
The statement of the main theorem is as follows. Given a finite-dimensional vector space and a function , and an integer , one can define the Gowers uniformity norm by the formula
where . If is bounded in magnitude by , it is easy to see that is bounded by also, with equality if and only if for some non-classical polynomial of degree at most , where , and a non-classical polynomial of degree at most is a function whose “derivatives” vanish in the sense that
for all , where . Our result generalises this to the case when the uniformity norm is not equal to , but is still bounded away from zero:
Theorem 1 (Inverse conjecture) Let be bounded by with for some . Then there exists a non-classical polynomial of degree at most such that , where is a positive quantity depending only on the indicated parameters.
This theorem is trivial for , and follows easily from Fourier analysis for . The case was done in odd characteristic by Ben Green and myself, and in even characteristic by Samorodnitsky. In two papers, one with Vitaly Bergelson, we established this theorem in the “high characteristic” case when the characteristic of was greater than (in which case there is essentially no distinction between non-classical polynomials and their classical counterparts, as discussed previously on this blog). The need to deal with genuinely non-classical polynomials is the main new difficulty in this paper that was not dealt with in previous literature.
In our previous paper with Bergelson, a “weak” version of the above theorem was proven, in which the polynomial in the conclusion had bounded degree , rather than being of degree at most . In the current paper, we use this weak inverse theorem to reduce the inverse conjecture to a statement purely about polynomials:
Theorem 2 (Inverse conjecture for polynomials) Let , and let be a non-classical polynomial of degree at most such that . Then has bounded rank in the sense that is a function of polynomials of degree at most .
This type of inverse theorem was first introduced by Bogdanov and Viola. The deduction of Theorem 1 from Theorem 2 and the weak inverse Gowers conjecture is fairly standard, so the main difficulty is to show Theorem 2.
The quantity of a polynomial of degree at most was denoted the analytic rank of by Gowers and Wolf. They observed that the analytic rank of was closely related to the rank of , defined as the least number of degree polynomials needed to express . For instance, in the quadratic case the two ranks are identical (in odd characteristic, at least). For general , it was easy to see that bounded rank implied bounded analytic rank; Theorem 2 is the converse statement.
We tried a number of ways to show that bounded analytic rank implied bounded rank, in particular spending a lot of time on ergodic-theoretic approaches, but eventually we settled on a “brute force” approach that relies on classifying those polynomials of bounded analytic rank as precisely as possible. The argument splits up into establishing three separate facts:
- (Classical case) If a classical polynomial has bounded analytic rank, then it has bounded rank.
- (Multiplication by ) If a non-classical polynomial (of degree at most ) has bounded analytic rank, then (which can be shown to have degree at most ) also has bounded analytic rank.
- (Division by ) If is a non-clsasical polynomial of degree of bounded rank, then there is a non-classical polynomial of degree at most of bounded rank such that .
The multiplication by and division by facts allow one to easily extend the classical case of the theorem to the non-classical case of the theorem, basically because classical polynomials are the kernel of the multiplication-by- homomorphism. Indeed, if is a non-classical polynomial of bounded analytic rank of the right degree, then the multiplication by claim tells us that also has bounded analytic rank, which by an induction hypothesis implies that has bounded rank. Applying the division by claim, we find a bounded rank polynomial such that , thus differs from by a classical polynomial, which necessarily has bounded analytic rank, hence bounded rank by the classical claim, and the claim follows.
Of the three claims, the multiplication-by- claim is the easiest to prove using known results; after a bit of Fourier analysis, it turns out to follow more or less immediately from the multidimensional Szemerédi theorem over finite fields of Bergelson, Leibman, and McCutcheon (one can also use the density Hales-Jewett theorem here if one desires).
The next easiest claim is the classical case. Here, the idea is to analyse a degree classical polynomial via its derivative , defined by the formula
for any (the RHS is independent of as has degree ). This is a multilinear form, and if has bounded analytic rank, this form is biased (in the sense that the mean of is large). Applying a general equidistribution theorem of Kaufman and Lovett (based on this earlier paper of Green and myself) this implies that is a function of a bounded number of multilinear forms of lower degree. Using some “regularity lemma” theory to clean up these forms so that they have good equidistribution properties, it is possible to understand exactly how the original multilinear form depends on these lower degree forms; indeed, the description one eventually obtains is so explicit that one can write down by inspection another bounded rank polynomial such that is equal to . Thus differs from the bounded rank polynomial by a lower degree error, which is automatically of bounded rank also, and the claim follows.
The trickiest thing to establish is the division by claim. The polynomial is some function of lower degree polynomials . Ideally, one would like to find a function of the same polynomials with , such that has the correct degree; however, we have counterexamples that show that this is not always possible. (These counterexamples are the main obstruction to making the ergodic theory approach work: in ergodic theory, one is only allowed to work with “measurable” functions, which are roughly analogous in this context to functions of the indicated polynomials and their shifts.) To get around this we have to first apply a regularity lemma to place in a suitably equidistributed form (although the fact that may be non-classical leads to a rather messy and technical description of this equidistribution), and then we have to extend each to a higher degree polynomial with . There is a crucial “exact roots” property of polynomials that allows one to do this, with having degree exactly higher than . It turns out that it is possible to find a function of these extended polynomials that have the right degree and which solves the required equation ; this is established by classifying completely all functions of the equidistributed polynomials or that are of a given degree.
In Notes 3, we saw that the number of additive patterns in a given set was (in principle, at least) controlled by the Gowers uniformity norms of functions associated to that set.
Such norms can be defined on any finite additive group (and also on some other types of domains, though we will not discuss this point here). In particular, they can be defined on the finite-dimensional vector spaces over a finite field .
In this case, the Gowers norms are closely tied to the space of polynomials of degree at most . Indeed, as noted in Exercise 20 of Notes 4, a function of norm has norm equal to if and only if for some ; thus polynomials solve the “ inverse problem” for the trivial inequality . They are also a crucial component of the solution to the “ inverse problem” and “ inverse problem”. For the former, we will soon show:
Thus, for the Gowers norm to be almost completely saturated, one must be very close to a polynomial. The converse assertion is easily established:
Exercise 1 (Converse to inverse theorem for ) If and for some , then , where is a constant depending only on .
In the world, one no longer expects to be close to a polynomial. Instead, one expects to correlate with a polynomial. Indeed, one has
Lemma 2 (Converse to the inverse theorem for ) If and are such that , where , then .
Proof: From the definition of the norm (equation (18) from Notes 3), the monotonicity of the Gowers norms (Exercise 19 of Notes 3), and the polynomial phase modulation invariance of the Gowers norms (Exercise 21 of Notes 3), one has
and the claim follows.
In the high characteristic case at least, this can be reversed:
This result is sometimes referred to as the inverse conjecture for the Gowers norm (in high, but bounded, characteristic). For small , the claim is easy:
Exercise 2 Verify the cases of this theorem. (Hint: to verify the case, use the Fourier-analytic identities and , where is the space of all homomorphisms from to , and are the Fourier coefficients of .)
This conjecture for larger values of are more difficult to establish. The case of the theorem was established by Ben Green and myself in the high characteristic case ; the low characteristic case was independently and simultaneously established by Samorodnitsky. The cases in the high characteristic case was established in two stages, firstly using a modification of the Furstenberg correspondence principle, due to Ziegler and myself. to convert the problem to an ergodic theory counterpart, and then using a modification of the methods of Host-Kra and Ziegler to solve that counterpart, as done in this paper of Bergelson, Ziegler, and myself.
The situation with the low characteristic case in general is still unclear. In the high characteristic case, we saw from Notes 4 that one could replace the space of non-classical polynomials in the above conjecture with the essentially equivalent space of classical polynomials . However, as we shall see below, this turns out not to be the case in certain low characteristic cases (a fact first observed by Lovett, Meshulam, and Samorodnitsky, and independently by Ben Green and myself), for instance if and ; this is ultimately due to the existence in those cases of non-classical polynomials which exhibit no significant correlation with classical polynomials of equal or lesser degree. This distinction between classical and non-classical polynomials appears to be a rather non-trivial obstruction to understanding the low characteristic setting; it may be necessary to obtain a more complete theory of non-classical polynomials in order to fully settle this issue.
The inverse conjecture has a number of consequences. For instance, it can be used to establish the analogue of Szemerédi’s theorem in this setting:
Exercise 3 Use Theorem 4 to establish the following generalisation: with the notation as above, if and is sufficiently large depending on , then contains an affine -dimensional subspace.
We will prove this theorem in two different ways, one using a density increment method, and the other using an energy increment method. We discuss some other applications below the fold.
In the previous lectures, we have focused mostly on the equidistribution or linear patterns on a subset of the integers , and in particular on intervals . The integers are of course a very important domain to study in additive combinatorics; but there are also other fundamental model examples of domains to study. One of these is that of a vector space over a finite field of prime order. Such domains are of interest in computer science (particularly when ) and also in number theory; but they also serve as an important simplified “dyadic model” for the integers. See this survey article of Green for further discussion of this point.
The additive combinatorics of the integers , and of vector spaces over finite fields, are analogous, but not quite identical. For instance, the analogue of an arithmetic progression in is a subspace of . In many cases, the finite field theory is a little bit simpler than the integer theory; for instance, subspaces are closed under addition, whereas arithmetic progressions are only “almost” closed under addition in various senses. (For instance, is closed under addition approximately half of the time.) However, there are some ways in which the integers are better behaved. For instance, because the integers can be generated by a single generator, a homomorphism from to some other group can be described by a single group element : . However, to specify a homomorphism from a vector space to one would need to specify one group element for each dimension of . Thus we see that there is a tradeoff when passing from (or ) to a vector space model; one gains a bounded torsion property, at the expense of conceding the bounded generation property. (Of course, if one wants to deal with arbitrarily large domains, one has to concede one or the other; the only additive groups that have both bounded torsion and boundedly many generators, are bounded.)
The starting point for this course (Notes 1) was the study of equidistribution of polynomials from the integers to the unit circle. We now turn to the parallel theory of equidistribution of polynomials from vector spaces over finite fields to the unit circle. Actually, for simplicity we will mostly focus on the classical case, when the polynomials in fact take values in the roots of unity (where is the characteristic of the field ). As it turns out, the non-classical case is also of importance (particularly in low characteristic), but the theory is more difficult; see these notes for some further discussion.
Jean-Pierre Serre (whose papers are, of course, always worth reading) recently posted a lovely lecture on the arXiv entitled “How to use finite fields for problems concerning infinite fields”. In it, he describes several ways in which algebraic statements over fields of zero characteristic, such as , can be deduced from their positive characteristic counterparts such as , despite the fact that there is no non-trivial field homomorphism between the two types of fields. In particular finitary tools, including such basic concepts as cardinality, can now be deployed to establish infinitary results. This leads to some simple and elegant proofs of non-trivial algebraic results which are not easy to establish by other means.
One deduction of this type is based on the idea that positive characteristic fields can partially model zero characteristic fields, and proceeds like this: if a certain algebraic statement failed over (say) , then there should be a “finitary algebraic” obstruction that “witnesses” this failure over . Because this obstruction is both finitary and algebraic, it must also be definable in some (large) finite characteristic, thus leading to a comparable failure over a finite characteristic field. Taking contrapositives, one obtains the claim.
Algebra is definitely not my own field of expertise, but it is interesting to note that similar themes have also come up in my own area of additive combinatorics (and more generally arithmetic combinatorics), because the combinatorics of addition and multiplication on finite sets is definitely of a “finitary algebraic” nature. For instance, a recent paper of Vu, Wood, and Wood establishes a finitary “Freiman-type” homomorphism from (finite subsets of) the complex numbers to large finite fields that allows them to pull back many results in arithmetic combinatorics in finite fields (e.g. the sum-product theorem) to the complex plane. (Van Vu and I also used a similar trick to control the singularity property of random sign matrices by first mapping them into finite fields in which cardinality arguments became available.) And I have a particular fondness for correspondences between finitary and infinitary mathematics; the correspondence Serre discusses is slightly different from the one I discuss for instance in here or here, although there seems to be a common theme of “compactness” (or of model theory) tying these correspondences together.
As one of his examples, Serre cites one of my own favourite results in algebra, discovered independently by Ax and by Grothendieck (and then rediscovered many times since). Here is a special case of that theorem:
The full version of the theorem allows one to replace by an algebraic variety over any algebraically closed field, and for to be an morphism from the algebraic variety to itself, but for simplicity I will just discuss the above special case. This theorem is not at all obvious; it is not too difficult (see Lemma 4 below) to show that the Jacobian of is non-degenerate, but this does not come close to solving the problem since one would then be faced with the notorious Jacobian conjecture. Also, the claim fails if “polynomial” is replaced by “holomorphic”, due to the existence of Fatou-Bieberbach domains.
In this post I would like to give the proof of Theorem 1 based on finite fields as mentioned by Serre, as well as another elegant proof of Rudin that combines algebra with some elementary complex variable methods. (There are several other proofs of this theorem and its generalisations, for instance a topological proof by Borel, which I will not discuss here.)
Update, March 8: Some corrections to the finite field proof. Thanks to Matthias Aschenbrenner also for clarifying the relationship with Tarski’s theorem and some further references.