You are currently browsing the category archive for the ‘paper’ category.

Apoorva Khare and I have just uploaded to the arXiv our paper “On the sign patterns of entrywise positivity preservers in fixed dimension“. This paper explores the relationship between positive definiteness of Hermitian matrices, and entrywise operations on these matrices. The starting point for this theory is the Schur product theorem, which asserts that if and are two Hermitian matrices that are positive semi-definite, then their Hadamard product

is also positive semi-definite. (One should caution that the Hadamard product is *not* the same as the usual matrix product.) To prove this theorem, first observe that the claim is easy when and are rank one positive semi-definite matrices, since in this case is also a rank one positive semi-definite matrix. The general case then follows by noting from the spectral theorem that a general positive semi-definite matrix can be expressed as a non-negative linear combination of rank one positive semi-definite matrices, and using the bilinearity of the Hadamard product and the fact that the set of positive semi-definite matrices form a convex cone. A modification of this argument also lets one replace “positive semi-definite” by “positive definite” in the statement of the Schur product theorem.

One corollary of the Schur product theorem is that any polynomial with non-negative coefficients is *entrywise positivity preserving* on the space of positive semi-definite Hermitian matrices, in the sense that for any matrix in , the entrywise application

of to is also positive semi-definite. (As before, one should caution that is *not* the application of to by the usual functional calculus.) Indeed, one can expand

where is the Hadamard product of copies of , and the claim now follows from the Schur product theorem and the fact that is a convex cone.

A slight variant of this argument, already observed by Pólya and Szegö in 1925, shows that if is any subset of and

is a power series with non-negative coefficients that is absolutely and uniformly convergent on , then will be entrywise positivity preserving on the set of positive definite matrices with entries in . (In the case that is of the form , such functions are precisely the absolutely monotonic functions on .)

In the work of Schoenberg and of Rudin, we have a converse: if is a function that is entrywise positivity preserving on for all , then it must be of the form (1) with . Variants of this result, with replaced by other domains, appear in the work of Horn, Vasudeva, and Guillot-Khare-Rajaratnam.

This gives a satisfactory classification of functions that are entrywise positivity preservers in all dimensions simultaneously. However, the question remains as to what happens if one fixes the dimension , in which case one may have a larger class of entrywise positivity preservers. For instance, in the trivial case , a function would be entrywise positivity preserving on if and only if is non-negative on . For higher , there is a necessary condition of Horn (refined slightly by Guillot-Khare-Rajaratnam) which asserts (at least in the case of smooth ) that all derivatives of at zero up to order must be non-negative in order for to be entrywise positivity preserving on for some . In particular, if is of the form (1), then must be non-negative. In fact, a stronger assertion can be made, namely that the first non-zero coefficients in (1) (if they exist) must be positive, or equivalently any negative term in (1) must be preceded (though not necessarily immediately) by at least positive terms. If is of the form (1) is entrywise positivity preserving on the larger set , one can furthermore show that any negative term in (1) must also be *followed* (though not necessarily immediately) by at least positive terms.

The main result of this paper is that these sign conditions are the *only* constraints for entrywise positivity preserving power series. More precisely:

Theorem 1For each , let be a sign.

- Suppose that any negative sign is preceded by at least positive signs (thus there exists with ). Then, for any , there exists a convergent power series (1) on , with each having the sign of , which is entrywise positivity preserving on .
- Suppose in addition that any negative sign is followed by at least positive signs (thus there exists with ). Then there exists a convergent power series (1) on , with each having the sign of , which is entrywise positivity preserving on .

One can ask the same question with or replaced by other domains such as , or the complex disk , but it turns out that there are far fewer entrywise positivity preserving functions in those cases basically because of the non-trivial zeroes of Schur polynomials in these ranges; see the paper for further discussion. We also have some quantitative bounds on how negative some of the coefficients can be compared to the positive coefficients, but they are a bit technical to state here.

The heart of the proofs of these results is an analysis of the determinants of polynomials applied entrywise to rank one matrices ; the positivity of these determinants can be used (together with a continuity argument) to establish the positive definiteness of for various ranges of and . Using the Cauchy-Binet formula, one can rewrite such determinants as linear combinations of squares of magnitudes of generalised Vandermonde determinants

where and the signs of the coefficients in the linear combination are determined by the signs of the coefficients of . The task is then to find upper and lower bounds for the magnitudes of such generalised Vandermonde determinants. These determinants oscillate in sign, which makes the problem look difficult; however, an algebraic miracle intervenes, namely the factorisation

of the generalised Vandermonde determinant into the ordinary Vandermonde determinant

and a Schur polynomial applied to , where the weight of the Schur polynomial is determined by in a simple fashion. The problem then boils down to obtaining upper and lower bounds for these Schur polynomials. Because we are restricting attention to matrices taking values in or , the entries of can be taken to be non-negative. One can then take advantage of the *total positivity* of the Schur polynomials to compare these polynomials with a monomial, at which point one can obtain good criteria for to be positive definite when is a rank one positive definite matrix .

If we allow the exponents to be real numbers rather than integers (thus replacing polynomials or power series by Pusieux series or Hahn series), then we lose the above algebraic miracle, but we can replace it with a geometric miracle, namely the *Harish-Chandra-Itzykson-Zuber identity*, which I discussed in this previous blog post. This factors the above generalised Vandermonde determinant as the product of the ordinary Vandermonde determinant and an integral of a positive quantity over the orthogonal group, which one can again compare with a monomial after some fairly elementary estimates.

It remains to understand what happens for more general positive semi-definite matrices . Here we use a trick of FitzGerald and Horn to amplify the rank one case to the general case, by expressing a general positive semi-definite matrix as a linear combination of a rank one matrix and another positive semi-definite matrix that vanishes on the last row and column (and is thus effectively a positive definite matrix). Using the fundamental theorem of calculus to continuously deform the rank one matrix to in the direction , one can then obtain positivity results for from positivity results for combined with an induction hypothesis on .

Joni Teräväinen and I have just uploaded to the arXiv our paper “The structure of logarithmically averaged correlations of multiplicative functions, with applications to the Chowla and Elliott conjectures“, submitted to Duke Mathematical Journal. This paper builds upon my previous paper in which I introduced an “entropy decrement method” to prove the two-point (logarithmically averaged) cases of the Chowla and Elliott conjectures. A bit more specifically, I showed that

whenever were sequences going to infinity, were distinct integers, and were -bounded multiplicative functions which were *non-pretentious* in the sense that

for all Dirichlet characters and for . Thus, for instance, one had the logarithmically averaged two-point Chowla conjecture

for fixed any non-zero , where was the Liouville function.

One would certainly like to extend these results to higher order correlations than the two-point correlations. This looks to be difficult (though perhaps not completely impossible if one allows for logarithmic averaging): in a previous paper I showed that achieving this in the context of the Liouville function would be equivalent to resolving the logarithmically averaged Sarnak conjecture, as well as establishing logarithmically averaged local Gowers uniformity of the Liouville function. However, in this paper we are able to avoid having to resolve these difficult conjectures to obtain partial results towards the (logarithmically averaged) Chowla and Elliott conjecture. For the Chowla conjecture, we can obtain all odd order correlations, in that

for all odd and all integers (which, in the odd order case, are no longer required to be distinct). (Superficially, this looks like we have resolved “half of the cases” of the logarithmically averaged Chowla conjecture; but it seems the odd order correlations are significantly easier than the even order ones. For instance, because of the Katai-Bourgain-Sarnak-Ziegler criterion, one can basically deduce the odd order cases of (2) from the even order cases (after allowing for some dilations in the argument ).

For the more general Elliott conjecture, we can show that

for any , any integers and any bounded multiplicative functions , unless the product *weakly pretends to be a Dirichlet character * in the sense that

This can be seen to imply (2) as a special case. Even when *does* pretend to be a Dirichlet character , we can still say something: if the limits

exist for each (which can be guaranteed if we pass to a suitable subsequence), then is the uniform limit of periodic functions , each of which is –isotypic in the sense that whenever are integers with coprime to the periods of and . This does not pin down the value of any single correlation , but does put significant constraints on how these correlations may vary with .

Among other things, this allows us to show that all possible length four sign patterns of the Liouville function occur with positive density, and all possible length four sign patterns occur with the conjectured logarithmic density. (In a previous paper with Matomaki and Radziwill, we obtained comparable results for length three patterns of Liouville and length two patterns of Möbius.)

To describe the argument, let us focus for simplicity on the case of the Liouville correlations

assuming for sake of discussion that all limits exist. (In the paper, we instead use the device of generalised limits, as discussed in this previous post.) The idea is to combine together two rather different ways to control this function . The first proceeds by the entropy decrement method mentioned earlier, which roughly speaking works as follows. Firstly, we pick a prime and observe that for any , which allows us to rewrite (3) as

Making the change of variables , we obtain

The difference between and is negligible in the limit (here is where we crucially rely on the log-averaging), hence

and thus by (3) we have

The entropy decrement argument can be used to show that the latter limit is small for most (roughly speaking, this is because the factors behave like independent random variables as varies, so that concentration of measure results such as Hoeffding’s inequality can apply, after using entropy inequalities to decouple somewhat these random variables from the factors). We thus obtain the approximate isotopy property

On the other hand, by the Furstenberg correspondence principle (as discussed in these previous posts), it is possible to express as a multiple correlation

for some probability space equipped with a measure-preserving invertible map . Using results of Bergelson-Host-Kra, Leibman, and Le, this allows us to obtain a decomposition of the form

where is a nilsequence, and goes to zero in density (even along the primes, or constant multiples of the primes). The original work of Bergelson-Host-Kra required ergodicity on , which is very definitely a hypothesis that is not available here; however, the later work of Leibman removed this hypothesis, and the work of Le refined the control on so that one still has good control when restricting to primes, or constant multiples of primes.

Ignoring the small error , we can now combine (5) to conclude that

Using the equidistribution theory of nilsequences (as developed in this previous paper of Ben Green and myself), one can break up further into a periodic piece and an “irrational” or “minor arc” piece . The contribution of the minor arc piece can be shown to mostly cancel itself out after dilating by primes and averaging, thanks to Vinogradov-type bilinear sum estimates (transferred to the primes). So we end up with

which already shows (heuristically, at least) the claim that can be approximated by periodic functions which are isotopic in the sense that

But if is odd, one can use Dirichlet’s theorem on primes in arithmetic progressions to restrict to primes that are modulo the period of , and conclude now that vanishes identically, which (heuristically, at least) gives (2).

The same sort of argument works to give the more general bounds on correlations of bounded multiplicative functions. But for the specific task of proving (2), we initially used a slightly different argument that avoids using the ergodic theory machinery of Bergelson-Host-Kra, Leibman, and Le, but replaces it instead with the Gowers uniformity norm theory used to count linear equations in primes. Basically, by averaging (4) in using the “-trick”, as well as known facts about the Gowers uniformity of the von Mangoldt function, one can obtain an approximation of the form

where ranges over a large range of integers coprime to some primorial . On the other hand, by iterating (4) we have

for most semiprimes , and by again averaging over semiprimes one can obtain an approximation of the form

For odd, one can combine the two approximations to conclude that . (This argument is not given in the current paper, but we plan to detail it in a subsequent one.)

I’ve just uploaded to the arXiv my paper “On the universality of the incompressible Euler equation on compact manifolds“, submitted to Discrete and Continuous Dynamical Systems. This is a variant of my recent paper on the universality of potential well dynamics, but instead of trying to embed dynamical systems into a potential well , here we try to embed dynamical systems into the incompressible Euler equations

on a Riemannian manifold . (One is particularly interested in the case of flat manifolds , particularly or , but for the main result of this paper it is essential that one is permitted to consider curved manifolds.) This system, first studied by Ebin and Marsden, is the natural generalisation of the usual incompressible Euler equations to curved space; it can be viewed as the formal geodesic flow equation on the infinite-dimensional manifold of volume-preserving diffeomorphisms on (see this previous post for a discussion of this in the flat space case).

The Euler equations can be viewed as a nonlinear equation in which the nonlinearity is a quadratic function of the velocity field . It is thus natural to compare the Euler equations with quadratic ODE of the form

where is the unknown solution, and is a bilinear map, which we may assume without loss of generality to be symmetric. One can ask whether such an ODE may be linearly embedded into the Euler equations on some Riemannian manifold , which means that there is an injective linear map from to smooth vector fields on , as well as a bilinear map to smooth scalar fields on , such that the map takes solutions to (2) to solutions to (1), or equivalently that

for all .

For simplicity let us restrict to be compact. There is an obvious necessary condition for this embeddability to occur, which comes from energy conservation law for the Euler equations; unpacking everything, this implies that the bilinear form in (2) has to obey a cancellation condition

for some positive definite inner product on . The main result of the paper is the converse to this statement: if is a symmetric bilinear form obeying a cancellation condition (3), then it is possible to embed the equations (2) into the Euler equations (1) on some Riemannian manifold ; the catch is that this manifold will depend on the form and on the dimension (in fact in the construction I have, is given explicitly as , with a funny metric on it that depends on ).

As a consequence, any finite dimensional portion of the usual “dyadic shell models” used as simplified toy models of the Euler equation, can actually be embedded into a genuine Euler equation, albeit on a high-dimensional and curved manifold. This includes portions of the self-similar “machine” I used in a previous paper to establish finite time blowup for an averaged version of the Navier-Stokes (or Euler) equations. Unfortunately, the result in this paper does not apply to infinite-dimensional ODE, so I cannot yet establish finite time blowup for the Euler equations on a (well-chosen) manifold. It does not seem so far beyond the realm of possibility, though, that this could be done in the relatively near future. In particular, the result here suggests that one could construct something resembling a universal Turing machine within an Euler flow on a manifold, which was one ingredient I would need to engineer such a finite time blowup.

The proof of the main theorem proceeds by an “elimination of variables” strategy that was used in some of my previous papers in this area, though in this particular case the Nash embedding theorem (or variants thereof) are not required. The first step is to lessen the dependence on the metric by partially reformulating the Euler equations (1) in terms of the covelocity (which is a -form) instead of the velocity . Using the freedom to modify the dimension of the underlying manifold , one can also decouple the metric from the volume form that is used to obtain the divergence-free condition. At this point the metric can be eliminated, with a certain positive definiteness condition between the velocity and covelocity taking its place. After a substantial amount of trial and error (motivated by some “two-and-a-half-dimensional” reductions of the three-dimensional Euler equations, and also by playing around with a number of variants of the classic “separation of variables” strategy), I eventually found an ansatz for the velocity and covelocity that automatically solved most of the components of the Euler equations (as well as most of the positive definiteness requirements), as long as one could find a number of scalar fields that obeyed a certain nonlinear system of transport equations, and also obeyed a positive definiteness condition. Here I was stuck for a bit because the system I ended up with was overdetermined – more equations than unknowns. After trying a number of special cases I eventually found a solution to the transport system on the sphere, except that the scalar functions sometimes degenerated and so the positive definiteness property I wanted was only obeyed with positive semi-definiteness. I tried for some time to perturb this example into a strictly positive definite solution before eventually working out that this was not possible. Finally I had the brainwave to lift the solution from the sphere to an even more symmetric space, and this quickly led to the final solution of the problem, using the special orthogonal group rather than the sphere as the underlying domain. The solution ended up being rather simple in form, but it is still somewhat miraculous to me that it exists at all; in retrospect, given the overdetermined nature of the problem, relying on a large amount of symmetry to cut down the number of equations was basically the only hope.

I’ve just uploaded to the arXiv my paper “On the universality of potential well dynamics“, submitted to Dynamics of PDE. This is a spinoff from my previous paper on blowup of nonlinear wave equations, inspired by some conversations with Sungjin Oh. Here we focus mainly on the zero-dimensional case of such equations, namely the potential well equation

for a particle trapped in a potential well with potential , with as . This ODE always admits global solutions from arbitrary initial positions and initial velocities , thanks to conservation of the Hamiltonian . As this Hamiltonian is coercive (in that its level sets are compact), solutions to this equation are always almost periodic. On the other hand, as can already be seen using the harmonic oscillator (and direct sums of this system), this equation can generate periodic solutions, as well as quasiperiodic solutions.

All quasiperiodic motions are almost periodic. However, there are many examples of dynamical systems that admit solutions that are almost periodic but not quasiperiodic. So one can pose the question: are the dynamics of potential wells *universal* in the sense that they can capture all almost periodic solutions?

A precise question can be phrased as follows. Let be a compact manifold, and let be a smooth vector field on ; to avoid degeneracies, let us take to be *non-singular* in the sense that it is everywhere non-vanishing. Then the trajectories of the first-order ODE

for are always global and almost periodic. Can we then find a (coercive) potential for some , as well as a smooth embedding , such that every solution to (2) pushes forward under to a solution to (1)? (Actually, for technical reasons it is preferable to map into the phase space , rather than position space , but let us ignore this detail for this discussion.)

It turns out that the answer is no; there is a very specific obstruction. Given a pair as above, define a *strongly adapted -form* to be a -form on such that is pointwise positive, and the Lie derivative is an exact -form. We then have

Theorem 1A smooth compact non-singular dynamics can be embedded smoothly in a potential well system if and only if it admits a strongly adapted -form.

For the “only if” direction, the key point is that potential wells (viewed as a Hamiltonian flow on the phase space ) admit a strongly adapted -form, namely the canonical -form , whose Lie derivative is the derivative of the Lagrangian and is thus exact. The converse “if” direction is mainly a consequence of the Nash embedding theorem, and follows the arguments used in my previous paper.

Interestingly, the same obstruction also works for potential wells in a more general Riemannian manifold than , or for nonlinear wave equations with a potential; combining the two, the obstruction is also present for wave maps with a potential.

It is then natural to ask whether this obstruction is non-trivial, in the sense that there are at least some examples of dynamics that do not support strongly adapted -forms (and hence cannot be modeled smoothly by the dynamics of a potential well, nonlinear wave equation, or wave maps). I posed this question on MathOverflow, and Robert Bryant provided a very nice construction, showing that the vector field on the -torus had no strongly adapted -forms, and hence the dynamics of this vector field cannot be smoothly reproduced by a potential well, nonlinear wave equation, or wave map:

On the other hand, the suspension of any diffeomorphism does support a strongly adapted -form (the derivative of the time coordinate), and using this and the previous theorem I was able to embed a universal Turing machine into a potential well. In particular, there are flows for an explicitly describable potential well whose trajectories have behavior that is undecidable using the usual ZFC axioms of set theory! So potential well dynamics are “effectively” universal, despite the presence of the aforementioned obstruction.

In my previous work on blowup for Navier-Stokes like equations, I speculated that if one could somehow replicate a universal Turing machine within the Euler equations, one could use this machine to create a “von Neumann machine” that replicated smaller versions of itself, which on iteration would lead to a finite time blowup. Now that such a mechanism is present in nonlinear wave equations, it is tempting to try to make this scheme work in that setting. Of course, in my previous paper I had already demonstrated finite time blowup, at least in a three-dimensional setting, but that was a relatively simple discretely self-similar blowup in which no computation occurred. This more complicated blowup scheme would be significantly more effort to set up, but would be proof-of-concept that the same scheme would in principle be possible for the Navier-Stokes equations, assuming somehow that one can embed a universal Turing machine into the Euler equations. (But I’m still hopelessly stuck on how to accomplish this latter task…)

Kaisa Matomaki, Maksym Radziwill, and I have uploaded to the arXiv our paper “Correlations of the von Mangoldt and higher divisor functions I. Long shift ranges“, submitted to Proceedings of the London Mathematical Society. This paper is concerned with the estimation of correlations such as

for medium-sized and large , where is the von Mangoldt function; we also consider variants of this sum in which one of the von Mangoldt functions is replaced with a (higher order) divisor function, but for sake of discussion let us focus just on the sum (1). Understanding this sum is very closely related to the problem of finding pairs of primes that differ by ; for instance, if one could establish a lower bound

then this would easily imply the twin prime conjecture.

The (first) Hardy-Littlewood conjecture asserts an asymptotic

as for any fixed positive , where the *singular series* is an arithmetic factor arising from the irregularity of distribution of at small moduli, defined explicitly by

when is even, and when is odd, where

is (half of) the twin prime constant. See for instance this previous blog post for a a heuristic explanation of this conjecture. From the previous discussion we see that (2) for would imply the twin prime conjecture. Sieve theoretic methods are only able to provide an upper bound of the form .

Needless to say, apart from the trivial case of odd , there are no values of for which the Hardy-Littlewood conjecture is known. However there are some results that say that this conjecture holds “on the average”: in particular, if is a quantity depending on that is somewhat large, there are results that show that (2) holds for most (i.e. for ) of the betwen and . Ideally one would like to get as small as possible, in particular one can view the full Hardy-Littlewood conjecture as the endpoint case when is bounded.

The first results in this direction were by van der Corput and by Lavrik, who established such a result with (with a subsequent refinement by Balog); Wolke lowered to , and Mikawa lowered further to . The main result of this paper is a further lowering of to . In fact (as in the preceding works) we get a better error term than , namely an error of the shape for any .

Our arguments initially proceed along standard lines. One can use the Hardy-Littlewood circle method to express the correlation in (2) as an integral involving exponential sums . The contribution of “major arc” is known by a standard computation to recover the main term plus acceptable errors, so it is a matter of controlling the “minor arcs”. After averaging in and using the Plancherel identity, one is basically faced with establishing a bound of the form

for any “minor arc” . If is somewhat close to a low height rational (specifically, if it is within of such a rational with ), then this type of estimate is roughly of comparable strength (by another application of Plancherel) to the best available prime number theorem in short intervals on the average, namely that the prime number theorem holds for most intervals of the form , and we can handle this case using standard mean value theorems for Dirichlet series. So we can restrict attention to the “strongly minor arc” case where is far from such rationals.

The next step (following some ideas we found in a paper of Zhan) is to rewrite this estimate not in terms of the exponential sums , but rather in terms of the Dirichlet polynomial . After a certain amount of computation (including some oscillatory integral estimates arising from stationary phase), one is eventually reduced to the task of establishing an estimate of the form

for any (with sufficiently large depending on ).

The next step, which is again standard, is the use of the Heath-Brown identity (as discussed for instance in this previous blog post) to split up into a number of components that have a Dirichlet convolution structure. Because the exponent we are shooting for is less than , we end up with five types of components that arise, which we call “Type “, “Type “, “Type “, “Type “, and “Type II”. The “Type II” sums are Dirichlet convolutions involving a factor supported on a range and is quite easy to deal with; the “Type ” terms are Dirichlet convolutions that resemble (non-degenerate portions of) the divisor function, formed from convolving together portions of . The “Type ” and “Type ” terms can be estimated satisfactorily by standard moment estimates for Dirichlet polynomials; this already recovers the result of Mikawa (and our argument is in fact slightly more elementary in that no Kloosterman sum estimates are required). It is the treatment of the “Type ” and “Type ” sums that require some new analysis, with the Type terms turning to be the most delicate. After using an existing moment estimate of Jutila for Dirichlet L-functions, matters reduce to obtaining a family of estimates, a typical one of which (relating to the more difficult Type sums) is of the form

for “typical” ordinates of size , where is the Dirichlet polynomial (a fragment of the Riemann zeta function). The precise definition of “typical” is a little technical (because of the complicated nature of Jutila’s estimate) and will not be detailed here. Such a claim would follow easily from the Lindelof hypothesis (which would imply that ) but of course we would like to have an unconditional result.

At this point, having exhausted all the Dirichlet polynomial estimates that are usefully available, we return to “physical space”. Using some further Fourier-analytic and oscillatory integral computations, we can estimate the left-hand side of (3) by an expression that is roughly of the shape

The phase can be Taylor expanded as the sum of and a lower order term , plus negligible errors. If we could discard the lower order term then we would get quite a good bound using the exponential sum estimates of Robert and Sargos, which control averages of exponential sums with purely monomial phases, with the averaging allowing us to exploit the hypothesis that is “typical”. Figuring out how to get rid of this lower order term caused some inefficiency in our arguments; the best we could do (after much experimentation) was to use Fourier analysis to shorten the sums, estimate a one-parameter average exponential sum with a binomial phase by a two-parameter average with a monomial phase, and then use the van der Corput process followed by the estimates of Robert and Sargos. This rather complicated procedure works up to it may be possible that some alternate way to proceed here could improve the exponent somewhat.

In a sequel to this paper, we will use a somewhat different method to reduce to a much smaller value of , but only if we replace the correlations by either or , and also we now only save a in the error term rather than .

In July I will be spending a week at Park City, being one of the mini-course lecturers in the Graduate Summer School component of the Park City Summer Session on random matrices. I have chosen to give some lectures on least singular values of random matrices, the circular law, and the Lindeberg exchange method in random matrix theory; this is a slightly different set of topics than I had initially advertised (which was instead about the Lindeberg exchange method and the local relaxation flow method), but after consulting with the other mini-course lecturers I felt that this would be a more complementary set of topics. I have uploaded an draft of my lecture notes (some portion of which is derived from my monograph on the subject); as always, comments and corrections are welcome.

*[Update, June 23: notes revised and reformatted to PCMI format. -T.]*

*[Update, Mar 19 2018: further revision. -T.]*

Ben Green and I have (finally!) uploaded to the arXiv our paper “New bounds for Szemerédi’s theorem, III: A polylogarithmic bound for “, submitted to Mathematika. This is the sequel to two previous papers (and an erratum to the former paper), concerning quantitative versions of Szemerédi’s theorem in the case of length four progressions. This sequel has been delayed for over a decade for a number of reasons, but we have finally managed to write the arguments up to our satisfaction and submit it (to a special issue of Mathematika honouring the work of Klaus Roth).

For any natural number , define to be the largest cardinality of a subset of which does not contain any non-trivial arithmetic progressions of length four (where “non-trivial” means that is non-zero). Trivially we have . In 1969, Szemerédi showed that . However, the decay rate that could be theoretically extracted from this argument (and from several subsequent proofs of this bound, including one by Roth) were quite poor. The first significant quantitative bound on this quantity was by Gowers, who showed that for some absolute constant . In the second paper in the above-mentioned series, we managed to improve this bound to . In this paper, we improve the bound further to , which seems to be the limit of the methods. (We remark that if we could take to be larger than one, this would imply the length four case of a well known conjecture of Erdös that any set of natural numbers whose sum of reciprocals diverges would contain arbitrarily long arithmetic progressions. Thanks to the work of Sanders and of Bloom, the corresponding case of the conjecture for length three conjectures is nearly settled, as it is known that for the analogous bound on one can take any less than one.)

Most of the previous work on bounding relied in some form or another on the *density increment argument* introduced by Roth back in 1953; roughly speaking, the idea is to show that if a dense subset of fails to contain arithmetic progressions of length four, one seeks to then locate a long subprogression of in which has increased density. This was the basic method for instance underlying our previous bound , as well as a finite field analogue of the bound ; however we encountered significant technical difficulties for several years in extending this argument to obtain the result of the current paper. Our method is instead based on “energy increment arguments”, and more specifically on establishing quantitative version of a Khintchine-type recurrence theorem, similar to the qualitative recurrence theorems established (in the ergodic theory context) by Bergelson-Host-Kra, and (in the current combinatorial context) by Ben Green and myself.

One way to phrase the latter recurrence theorem is as follows. Suppose that has density . Then one would expect a “randomly” selected arithmetic progression in (using the convention that random variables will be in boldface) to be contained in with probability about . This is not true in general, however it was shown by Ben and myself that for any , there was a set of shifts of cardinality , such that for any such one had

if was chosen uniformly at random from . This easily implies that , but does not give a particularly good bound on the decay rate, because the implied constant in the cardinality lower bound is quite poor (in fact of tower-exponential type, due to the use of regularity lemmas!), and so one has to take to be extremely large compared to to avoid the possibility that the set of shifts in the above theorem consists only of the trivial shift .

We do not know how to improve the lower bound on the set of shifts to the point where it can give bounds that are competitive with those in this paper. However, we can obtain better quantitative results if we permit ourselves to *couple* together the two parameters and of the length four progression. Namely, with , , as above, we are able to show that there exist random variables , not necessarily independent, such that

and such that we have the non-degeneracy bound

This then easily implies the main theorem.

The energy increment method is then deployed to locate a good pair of random variables that will obey the above bounds. One can get some intuition on how to proceed here by considering some model cases. Firstly one can consider a “globally quadratically structured” case in which the indicator function “behaves like” a globally quadratic function such as , for some irrational and some smooth periodic function of mean . If one then takes to be uniformly distributed in and respectively for some small , with no coupling between the two variables, then the left-hand side of (1) is approximately of the form

where the integral is with respect to the probability Haar measure, and the constraint ultimately arises from the algebraic constraint

However, an application of the Cauchy-Schwarz inequality and Fubini’s theorem shows that the integral in (2) is at least , which (morally at least) gives (1) in this case.

Due to the nature of the energy increment argument, it also becomes necessary to consider “locally quadratically structured” cases, in which is partitioned into some number of structured pieces (think of these as arithmetic progressions, or as “Bohr sets), and on each piece , behaves like a locally quadratic function such as , where now varies with , and the mean of will be approximately on the average after averaging in (weighted by the size of the pieces ). Now one should select and in the following coupled manner: first one chooses uniformly from , then one defines to be the label such that , and then selects uniformly from a set which is related to in much the same way that is related to . If one does this correctly, the analogue of (2) becomes

and one can again use Cauchy-Schwarz and Fubini’s theorem to conclude.

The general case proceeds, very roughly, by an iterative argument. At each stage of the iteration, one has some sort of quadratic model of which involves a decomposition of into structured pieces , and a quadratic approximation to on each piece. If this approximation is accurate enough (or more precisely, if a certain (averaged) local Gowers uniformity norm of the error is small enough) to model the count in (1) (for random variables determined by the above partition of into pieces ), and if the frequencies (such as ) involved in the quadratic approximation are “high rank” or “linearly independent over the rationals” in a suitably quantitative sense, then some version of the above arguments can be made to work. If there are some unwanted linear dependencies in the frequencies, we can do some linear algebra to eliminate one of the frequencies (using some geometry of numbers to keep the quantitative bounds under control) and continue the iteration. If instead the approximation is too inaccurate, then the error will be large in a certain averaged local Gowers uniformity norm . A significant fraction of the paper is then devoted to establishing a quantitative *inverse theorem* for that norm that concludes (with good bounds) that the error must then locally correlate with locally quadratic phases, which can be used to refine the quadratic approximation to in a manner that significantly increases its “energy” (basically an norm). Such energy increments cannot continue indefinitely, and when they terminate we obtain the desired claim.

There are existing inverse theorems for type norms in the literature, going back to the pioneering work of Gowers mentioned previously, and relying on arithmetic combinatorics tools such as Freiman’s theorem and the Balog-Szemerédi-Gowers lemma, which are good for analysing the “-structured homomorphisms” that arise in Gowers’ argument. However, when we applied these methods to the local Gowers norms we obtained inferior quantitative results that were not strong enough for our application. Instead, we use arguments from a different paper of Gowers in which he tackled Szemerédi’s theorem for arbitrary length progressions. This method produces “-structured homomorphisms” associated to any function with large Gowers uniformity norm; however the catch is that such homomorphisms are initially supported only on a sparse unstructured set, rather than a structured set such as a Bohr set. To proceed further, one first has to locate inside the sparse unstructured set a sparse *pseudorandom* subset of a Bohr set, and then use “error-correction” type methods (such as “majority-vote” based algorithms) to locally upgrade this -structured homomorphism on pseudorandom subsets of Bohr sets to a -structured homomorphism on the entirety of a Bohr set. It is then possible to use some “approximate cohomology” tools to “integrate” these homomorphisms (and discern a key “local symmetry” property of these homomorphisms) to locate the desired local quadratic structure (in much the same fashion that a -form on that varies linearly with the coordinates can be integrated to be the derivative of a quadratic function if we know that the -form is closed). These portions of the paper are unfortunately rather technical, but broadly follow the methods already used in previous literature.

Daniel Kane and I have just uploaded to the arXiv our paper “A bound on partitioning clusters“, submitted to the Electronic Journal of Combinatorics. In this short and elementary paper, we consider a question that arose from biomathematical applications: given a finite family of sets (or “clusters”), how many ways can there be of partitioning a set in this family as the disjoint union of two other sets in this family? That is to say, what is the best upper bound one can place on the quantity

in terms of the cardinality of ? A trivial upper bound would be , since this is the number of possible pairs , and clearly determine . In our paper, we establish the improved bound

where is the somewhat strange exponent

so that . Furthermore, this exponent is best possible!

Actually, the latter claim is quite easy to show: one takes to be all the subsets of of cardinality either or , for a multiple of , and the claim follows readily from Stirling’s formula. So it is perhaps the former claim that is more interesting (since many combinatorial proof techniques, such as those based on inequalities such as the Cauchy-Schwarz inequality, tend to produce exponents that are rational or at least algebraic). We follow the common, though unintuitive, trick of generalising a problem to make it simpler. Firstly, one generalises the bound to the “trilinear” bound

for arbitrary finite collections of sets. One can place all the sets in inside a single finite set such as , and then by replacing every set in by its complement in , one can phrase the inequality in the equivalent form

for arbitrary collections of subsets of . We generalise further by turning sets into functions, replacing the estimate with the slightly stronger convolution estimate

for arbitrary functions on the Hamming cube , where the convolution is on the integer lattice rather than on the finite field vector space . The advantage of working in this general setting is that it becomes very easy to apply induction on the dimension ; indeed, to prove this estimate for arbitrary it suffices to do so for . This reduces matters to establishing the elementary inequality

for all , which can be done by a combination of undergraduate multivariable calculus and a little bit of numerical computation. (The left-hand side turns out to have local maxima at , with the latter being the cause of the numerology (1).)

The same sort of argument also gives an energy bound

for any subset of the Hamming cube, where

is the additive energy of . The example shows that the exponent cannot be improved.

I’ve just uploaded to the arXiv my paper “An integration approach to the Toeplitz square peg problem“, submitted to Forum of Mathematics, Sigma. This paper resulted from my attempts recently to solve the Toeplitz square peg problem (also known as the inscribed square problem):

Conjecture 1 (Toeplitz square peg problem)Let be a simple closed curve in the plane. Is it necessarily the case that contains four vertices of a square?

See this recent survey of Matschke in the Notices of the AMS for the latest results on this problem.

The route I took to the results in this paper was somewhat convoluted. I was motivated to look at this problem after lecturing recently on the Jordan curve theorem in my class. The problem is superficially similar to the Jordan curve theorem in that the result is known (and rather easy to prove) if is sufficiently regular (e.g. if it is a polygonal path), but seems to be significantly more difficult when the curve is merely assumed to be continuous. Roughly speaking, all the known positive results on the problem have proceeded using (in some form or another) tools from homology: note for instance that one can view the conjecture as asking whether the four-dimensional subset of the eight-dimensional space necessarily intersects the four-dimensional space consisting of the quadruples traversing a square in (say) anti-clockwise order; this space is a four-dimensional linear subspace of , with a two-dimensional subspace of “degenerate” squares removed. If one ignores this degenerate subspace, one can use intersection theory to conclude (under reasonable “transversality” hypotheses) that intersects an odd number of times (up to the cyclic symmetries of the square), which is basically how Conjecture 1 is proven in the regular case. Unfortunately, if one then takes a limit and considers what happens when is just a continuous curve, the odd number of squares created by these homological arguments could conceivably all degenerate to points, thus blocking one from proving the conjecture in the general case.

Inspired by my previous work on finite time blowup for various PDEs, I first tried looking for a counterexample in the category of (locally) self-similar curves that are smooth (or piecewise linear) away from a single origin where it can oscillate infinitely often; this is basically the smoothest type of curve that was not already covered by previous results. By a rescaling and compactness argument, it is not difficult to see that such a counterexample would exist if there was a counterexample to the following periodic version of the conjecture:

Conjecture 2 (Periodic square peg problem)Let be two disjoint simple closed piecewise linear curves in the cylinder which have a winding number of one, that is to say they are homologous to the loop from to . Then the union of and contains the four vertices of a square.

In contrast to Conjecture 1, which is known for polygonal paths, Conjecture 2 is still open even under the hypothesis of polygonal paths; the homological arguments alluded to previously now show that the number of inscribed squares in the periodic setting is *even* rather than *odd*, which is not enough to conclude the conjecture. (This flipping of parity from odd to even due to an infinite amount of oscillation is reminiscent of the “Eilenberg-Mazur swindle“, discussed in this previous post.)

I therefore tried to construct counterexamples to Conjecture 2. I began perturbatively, looking at curves that were small perturbations of constant functions. After some initial Taylor expansion, I was blocked from forming such a counterexample because an inspection of the leading Taylor coefficients required one to construct a continuous periodic function of mean zero that never vanished, which of course was impossible by the intermediate value theorem. I kept expanding to higher and higher order to try to evade this obstruction (this, incidentally, was when I discovered this cute application of Lagrange reversion) but no matter how high an accuracy I went (I think I ended up expanding to sixth order in a perturbative parameter before figuring out what was going on!), this obstruction kept resurfacing again and again. I eventually figured out that this obstruction was being caused by a “conserved integral of motion” for both Conjecture 2 and Conjecture 1, which can in fact be used to largely rule out perturbative constructions. This yielded a new positive result for both conjectures:

We sketch the proof of Theorem 3(i) as follows (the proof of Theorem 3(ii) is very similar). Let be the curve , thus traverses one of the two graphs that comprise . For each time , there is a unique square with first vertex (and the other three vertices, traversed in anticlockwise order, denoted ) such that also lies in the graph of and also lies in the graph of (actually for technical reasons we have to extend by constants to all of in order for this claim to be true). To see this, we simply rotate the graph of clockwise by around , where (by the Lipschitz hypotheses) it must hit the graph of in a unique point, which is , and which then determines the other two vertices of the square. The curve has the same starting and ending point as the graph of or ; using the Lipschitz hypothesis one can show this graph is simple. If the curve ever hits the graph of other than at the endpoints, we have created an inscribed square, so we may assume for contradiction that avoids the graph of , and hence by the Jordan curve theorem the two curves enclose some non-empty bounded open region .

Now for the conserved integral of motion. If we integrate the -form on each of the four curves , we obtain the identity

This identity can be established by the following calculation: one can parameterise

for some Lipschitz functions ; thus for instance . Inserting these parameterisations and doing some canceling, one can write the above integral as

which vanishes because (which represent the sidelengths of the squares determined by vanish at the endpoints .

Using this conserved integral of motion, one can show that

which by Stokes’ theorem then implies that the bounded open region mentioned previously has zero area, which is absurd.

This argument hinged on the curve being simple, so that the Jordan curve theorem could apply. Once one left the perturbative regime of curves of small Lipschitz constant, it became possible for to be self-crossing, but nevertheless there still seemed to be some sort of integral obstruction. I eventually isolated the problem in the form of a strengthened version of Conjecture 2:

Conjecture 4 (Area formulation of square peg problem)Let be simple closed piecewise linear curves of winding number obeying the area identity(note the -form is still well defined on the cylinder ; note also that the curves are allowed to cross each other.) Then there exists a (possibly degenerate) square with vertices (traversed in anticlockwise order) lying on respectively.

It is not difficult to see that Conjecture 4 implies Conjecture 2. Actually I believe that the converse implication is at least morally true, in that any counterexample to Conjecture 4 can be eventually transformed to a counterexample to Conjecture 2 and Conjecture 1. The conserved integral of motion argument can establish Conjecture 4 in many cases, for instance if are graphs of functions of Lipschitz constant less than one.

Conjecture 4 has a model special case, when one of the is assumed to just be a horizontal loop. In this case, the problem collapses to that of producing an intersection between two three-dimensional subsets of a six-dimensional space, rather than to four-dimensional subsets of an eight-dimensional space. More precisely, some elementary transformations reveal that this special case of Conjecture 4 can be formulated in the following fashion in which the geometric notion of a square is replaced by the additive notion of a triple of real numbers summing to zero:

Conjecture 5 (Special case of area formulation)Let be simple closed piecewise linear curves of winding number obeying the area identityThen there exist and with such that for .

This conjecture is easy to establish if one of the curves, say , is the graph of some piecewise linear function , since in that case the curve and the curve enclose the same area in the sense that , and hence must intersect by the Jordan curve theorem (otherwise they would enclose a non-zero amount of area between them), giving the claim. But when none of the are graphs, the situation becomes combinatorially more complicated.

Using some elementary homological arguments (e.g. breaking up closed -cycles into closed paths) and working with a generic horizontal slice of the curves, I was able to show that Conjecture 5 was equivalent to a one-dimensional problem that was largely combinatorial in nature, revolving around the sign patterns of various triple sums with drawn from various finite sets of reals.

Conjecture 6 (Combinatorial form)Let be odd natural numbers, and for each , let be distinct real numbers; we adopt the convention that . Assume the following axioms:

- (i) For any , the sums are non-zero.
- (ii) (Non-crossing) For any and with the same parity, the pairs and are non-crossing in the sense that
- (iii) (Non-crossing sums) For any , , of the same parity, one has
Then one has

Roughly speaking, Conjecture 6 and Conjecture 5 are connected by constructing curves to connect to for by various paths, which either lie to the right of the axis (when is odd) or to the left of the axis (when is even). The axiom (ii) is asserting that the numbers are ordered according to the permutation of a meander (formed by gluing together two non-crossing perfect matchings).

Using various *ad hoc* arguments involving “winding numbers”, it is possible to prove this conjecture in many cases (e.g. if one of the is at most ), to the extent that I have now become confident that this conjecture is true (and have now come full circle from trying to disprove Conjecture 1 to now believing that this conjecture holds also). But it seems that there is some non-trivial combinatorial argument to be made if one is to prove this conjecture; purely homological arguments seem to partially resolve the problem, but are not sufficient by themselves.

While I was not able to resolve the square peg problem, I think these results do provide a roadmap to attacking it, first by focusing on the combinatorial conjecture in Conjecture 6 (or its equivalent form in Conjecture 5), then after that is resolved moving on to Conjecture 4, and then finally to Conjecture 1.

## Recent Comments