You are currently browsing the category archive for the ‘Mathematics’ category.

I have just uploaded to the arXiv the paper “An inverse theorem for an inequality of Kneser“, submitted to a special issue of the Proceedings of the Steklov Institute of Mathematics in honour of Sergei Konyagin. It concerns an inequality of Kneser discussed previously in this blog, namely that

whenever are compact non-empty subsets of a compact connected additive group with probability Haar measure . (A later result of Kemperman extended this inequality to the nonabelian case.) This inequality is non-trivial in the regime

The connectedness of is essential, otherwise one could form counterexamples involving proper subgroups of of positive measure. In the blog post, I indicated how this inequality (together with a more “robust” strengthening of it) could be deduced from submodularity inequalities such as

which in turn easily follows from the identity and the inclusion , combined with the inclusion-exclusion formula.

In the non-trivial regime (2), equality can be attained in (1), for instance by taking to be the unit circle and to be arcs in that circle (obeying (2)). A bit more generally, if is an arbitrary connected compact abelian group and is a non-trivial character (i.e., a continuous homomorphism), then must be surjective (as has no non-trivial connected subgroups), and one can take and for some arcs in that circle (again choosing the measures of these arcs to obey (2)). The main result of this paper is an inverse theorem that asserts that this is the only way in which equality can occur in (1) (assuming (2)); furthermore, if (1) is close to being satisfied with equality and (2) holds, then must be close (in measure) to an example of the above form . Actually, for technical reasons (and for the applications we have in mind), it is important to establish an inverse theorem not just for (1), but for the more robust version mentioned earlier (in which the sumset is replaced by the partial sumset consisting of “popular” sums).

Roughly speaking, the idea is as follows. Let us informally call a *critical pair* if (2) holds and the inequality (1) (or more precisely, a robust version of this inequality) is almost obeyed with equality. The notion of a critical pair obeys some useful closure properties. Firstly, it is symmetric in , and invariant with respect to translation of either or . Furthermore, from the submodularity inequality (3), one can show that if and are critical pairs (with and positive), then and are also critical pairs. (Note that this is consistent with the claim that critical pairs only occur when come from arcs of a circle.) Similarly, from associativity , one can show that if and are critical pairs, then so are and .

One can combine these closure properties to obtain further ones. For instance, suppose is such that , so that all translates , intersect each other in a set of positive measure. Suppose also that is a critical pair and . Then (cheating a little bit), one can show that is also a critical pair, basically because is the union of the , , the are all critical pairs, and the all intersect each other. This argument doesn’t quite work as stated because one has to apply the closure property under union an uncountable number of times, but it turns out that if one works with the robust version of sumsets and uses a random sampling argument to approximate by the union of finitely many of the , then the argument can be made to work.

Using all of these closure properties, it turns out that one can start with an arbitrary critical pair and end up with a small set such that and are also critical pairs for all (say), where is the -fold sumset of . (Intuitively, if are thought of as secretly coming from the pullback of arcs by some character , then should be the pullback of a much shorter arc by the same character.) In particular, exhibits linear growth, in that for all . One can now use standard technology from inverse sumset theory to show first that has a very large Fourier coefficient (and thus is biased with respect to some character ), and secondly that is in fact almost of the form for some arc , from which it is not difficult to conclude similar statements for and and thus finish the proof of the inverse theorem.

In order to make the above argument rigorous, one has to be more precise about what the modifier “almost” means in the definition of a critical pair. I chose to do this in the language of “cheap” nonstandard analysis (aka asymptotic analysis), as discussed in this previous blog post; one could also have used the full-strength version of nonstandard analysis, but this does not seem to convey any substantial advantages. (One can also work in a more traditional “non-asymptotic” framework, but this requires one to keep much more careful account of various small error terms and leads to a messier argument.)

*[Update, Nov 15: Corrected the attribution of the inequality (1) to Kneser instead of Kemperman. Thanks to John Griesmer for pointing out the error.]*

A basic object of study in multiplicative number theory are the arithmetic functions: functions from the natural numbers to the complex numbers. Some fundamental examples of such functions include

- The constant function ;
- The Kronecker delta function ;
- The natural logarithm function ;
- The divisor function ;
- The von Mangoldt function , with defined to equal when is a power of a prime for some , and defined to equal zero otherwise; and
- The Möbius function , with defined to equal when is the product of distinct primes, and defined to equal zero otherwise.

Given an arithmetic function , we are often interested in statistics such as the summatory function

the logarithmically (or harmonically) weighted summatory function

or the Dirichlet series

In the latter case, one typically has to first restrict to those complex numbers whose real part is large enough in order to ensure the series on the right converges; but in many important cases, one can then extend the Dirichlet series to almost all of the complex plane by analytic continuation. One is also interested in correlations involving additive shifts, such as , but these are significantly more difficult to study and cannot be easily estimated by the methods of classical multiplicative number theory.

A key operation on arithmetic functions is that of Dirichlet convolution, which when given two arithmetic functions , forms a new arithmetic function , defined by the formula

Thus for instance , , , and for any arithmetic function . Dirichlet convolution and Dirichlet series are related by the fundamental formula

at least when the real part of is large enough that all sums involved become absolutely convergent (but in practice one can use analytic continuation to extend this identity to most of the complex plane). There is also the identity

at least when the real part of is large enough to justify interchange of differentiation and summation. As a consequence, many Dirichlet series can be expressed in terms of the Riemann zeta function , thus for instance

Much of the difficulty of multiplicative number theory can be traced back to the discrete nature of the natural numbers , which form a rather complicated abelian semigroup with respect to multiplication (in particular the set of generators is the set of prime numbers). One can obtain a simpler analogue of the subject by working instead with the half-infinite interval , which is a much simpler abelian semigroup under multiplication (being a one-dimensional Lie semigroup). (I will think of this as a sort of “completion” of at the infinite place , hence the terminology.) Accordingly, let us define a *continuous arithmetic function* to be a locally integrable function . The analogue of the summatory function (1) is then an integral

and similarly the analogue of (2) is

The analogue of the Dirichlet series is the Mellin-type transform

which will be well-defined at least if the real part of is large enough and if the continuous arithmetic function does not grow too quickly, and hopefully will also be defined elsewhere in the complex plane by analytic continuation.

For instance, the continuous analogue of the discrete constant function would be the constant function , which maps any to , and which we will denote by in order to keep it distinct from . The two functions and have approximately similar statistics; for instance one has

and

where is the harmonic number, and we are deliberately vague as to what the symbol means. Continuing this analogy, we would expect

which reflects the fact that has a simple pole at with residue , and no other poles. Note that the identity is initially only valid in the region , but clearly the right-hand side can be continued analytically to the entire complex plane except for the pole at , and so one can define in this region also.

In a similar vein, the logarithm function is approximately similar to the logarithm function , giving for instance the crude form

of Stirling’s formula, or the Dirichlet series approximation

The continuous analogue of Dirichlet convolution is multiplicative convolution using the multiplicative Haar measure : given two continuous arithmetic functions , one can define their convolution by the formula

Thus for instance . A short computation using Fubini’s theorem shows the analogue

of (3) whenever the real part of is large enough that Fubini’s theorem can be justified; similarly, differentiation under the integral sign shows that

again assuming that the real part of is large enough that differentiation under the integral sign (or some other tool like this, such as the Cauchy integral formula for derivatives) can be justified.

Direct calculation shows that for any complex number , one has

(at least for the real part of large enough), and hence by several applications of (5)

for any natural number . This can lead to the following heuristic: if a Dirichlet series behaves like a linear combination of poles , in that

for some set of poles and some coefficients and natural numbers (where we again are vague as to what means, and how to interpret the sum if the set of poles is infinite), then one should expect the arithmetic function to behave like the continuous arithmetic function

In particular, if we only have simple poles,

then we expect to have behave like continuous arithmetic function

Integrating this from to , this heuristically suggests an approximation

for the summatory function, and similarly

with the convention that is when , and similarly is when . One can make these sorts of approximations more rigorous by means of Perron’s formula (or one of its variants) combined with the residue theorem, provided that one has good enough control on the relevant Dirichlet series, but we will not pursue these rigorous calculations here. (But see for instance this previous blog post for some examples.)

For instance, using the more refined approximation

to the zeta function near , we have

we would expect that

and thus for instance

which matches what one actually gets from the Dirichlet hyperbola method (see e.g. equation (44) of this previous post).

Or, noting that has a simple pole at and assuming simple zeroes elsewhere, the log derivative will have simple poles of residue at and at all the zeroes, leading to the heuristic

suggesting that should behave like the continuous arithmetic function

leading for instance to the summatory approximation

which is a heuristic form of the Riemann-von Mangoldt explicit formula (see Exercise 45 of these notes for a rigorous version of this formula).

Exercise 1Go through some of the other explicit formulae listed at this Wikipedia page and give heuristic justifications for them (up to some lower order terms) by similar calculations to those given above.

Given the “adelic” perspective on number theory, I wonder if there are also -adic analogues of arithmetic functions to which a similar set of heuristics can be applied, perhaps to study sums such as . A key problem here is that there does not seem to be any good interpretation of the expression when is complex and is a -adic number, so it is not clear that one can analyse a Dirichlet series -adically. For similar reasons, we don’t have a canonical way to define for a Dirichlet character (unless its conductor happens to be a power of ), so there doesn’t seem to be much to say in the -aspect either.

Let be the Liouville function, thus is defined to equal when is the product of an even number of primes, and when is the product of an odd number of primes. The Chowla conjecture asserts that has the statistics of a random sign pattern, in the sense that

for all and all distinct natural numbers , where we use the averaging notation

For , this conjecture is equivalent to the prime number theorem (as discussed in this previous blog post), but the conjecture remains open for any .

In recent years, it has been realised that one can make more progress on this conjecture if one works instead with the logarithmically averaged version

of the conjecture, where we use the logarithmic averaging notation

Using the summation by parts (or telescoping series) identity

it is not difficult to show that the Chowla conjecture (1) for a given implies the logarithmically averaged conjecture (2). However, the converse implication is not at all clear. For instance, for , we have already mentioned that the Chowla conjecture

is equivalent to the prime number theorem; but the logarithmically averaged analogue

is significantly easier to show (a proof with the Liouville function replaced by the closely related Möbius function is given in this previous blog post). And indeed, significantly more is now known for the logarithmically averaged Chowla conjecture; in this paper of mine I had proven (2) for , and in this recent paper with Joni Teravainen, we proved the conjecture for all odd (with a different proof also given here).

In view of this emerging consensus that the logarithmically averaged Chowla conjecture was easier than the ordinary Chowla conjecture, it was thus somewhat of a surprise for me to read a recent paper of Gomilko, Kwietniak, and Lemanczyk who (among other things) established the following statement:

Theorem 1Assume that the logarithmically averaged Chowla conjecture (2) is true for all . Then there exists a sequence going to infinity such that the Chowla conjecture (1) is true for all along that sequence, that is to sayfor all and all distinct .

This implication does not use any special properties of the Liouville function (other than that they are bounded), and in fact proceeds by ergodic theoretic methods, focusing in particular on the ergodic decomposition of invariant measures of a shift into ergodic measures. Ergodic methods have proven remarkably fruitful in understanding these sorts of number theoretic and combinatorial problems, as could already be seen by the ergodic theoretic proof of Szemerédi’s theorem by Furstenberg, and more recently by the work of Frantzikinakis and Host on Sarnak’s conjecture. (My first paper with Teravainen also uses ergodic theory tools.) Indeed, many other results in the subject were first discovered using ergodic theory methods.

On the other hand, many results in this subject that were first proven ergodic theoretically have since been reproven by more combinatorial means; my second paper with Teravainen is an instance of this. As it turns out, one can also prove Theorem 1 by a standard combinatorial (or probabilistic) technique known as the second moment method. In fact, one can prove slightly more:

Theorem 2Let be a natural number. Assume that the logarithmically averaged Chowla conjecture (2) is true for . Then there exists a set of natural numbers of logarithmic density (that is, ) such thatfor any distinct .

It is not difficult to deduce Theorem 1 from Theorem 2 using a diagonalisation argument. Unfortunately, the known cases of the logarithmically averaged Chowla conjecture ( and odd ) are currently insufficient to use Theorem 2 for any purpose other than to reprove what is already known to be true from the prime number theorem. (Indeed, the even cases of Chowla, in either logarithmically averaged or non-logarithmically averaged forms, seem to be far more powerful than the odd cases; see Remark 1.7 of this paper of myself and Teravainen for a related observation in this direction.)

We now sketch the proof of Theorem 2. For any distinct , we take a large number and consider the limiting the second moment

We can expand this as

If all the are distinct, the hypothesis (2) tells us that the inner averages goes to zero as . The remaining averages are , and there are of these averages. We conclude that

By Markov’s inequality (and (3)), we conclude that for any fixed , there exists a set of upper logarithmic density at least , thus

such that

By deleting at most finitely many elements, we may assume that consists only of elements of size at least (say).

For any , if we let be the union of for , then has logarithmic density . By a diagonalisation argument (using the fact that the set of tuples is countable), we can then find a set of natural numbers of logarithmic density , such that for every , every sufficiently large element of lies in . Thus for every sufficiently large in , one has

for some with . By Cauchy-Schwarz, this implies that

interchanging the sums and using and , this implies that

We conclude on taking to infinity that

as required.

Let be a monic polynomial of degree with complex coefficients. Then by the fundamental theorem of algebra, we can factor as

for some complex zeroes (possibly with repetition).

Now suppose we evolve with respect to time by heat flow, creating a function of two variables with given initial data for which

On the space of polynomials of degree at most , the operator is nilpotent, and one can solve this equation explicitly both forwards and backwards in time by the Taylor series

For instance, if one starts with a quadratic , then the polynomial evolves by the formula

As the polynomial evolves in time, the zeroes evolve also. Assuming for sake of discussion that the zeroes are simple, the inverse function theorem tells us that the zeroes will (locally, at least) evolve smoothly in time. What are the dynamics of this evolution?

For instance, in the quadratic case, the quadratic formula tells us that the zeroes are

and

after arbitrarily choosing a branch of the square root. If are real and the discriminant is initially positive, we see that we start with two real zeroes centred around , which then approach each other until time , at which point the roots collide and then move off from each other in an imaginary direction.

In the general case, we can obtain the equations of motion by implicitly differentiating the defining equation

in time using (2) to obtain

To simplify notation we drop the explicit dependence on time, thus

From (1) and the product rule, we see that

and

(where all indices are understood to range over ) leading to the equations of motion

at least when one avoids those times in which there is a repeated zero. In the case when the zeroes are real, each term represents a (first-order) attraction in the dynamics between and , but the dynamics are more complicated for complex zeroes (e.g. purely imaginary zeroes will experience repulsion rather than attraction, as one already sees in the quadratic example). Curiously, this system resembles that of Dyson brownian motion (except with the brownian motion part removed, and time reversed). I learned of the connection between the ODE (3) and the heat equation from this paper of Csordas, Smith, and Varga, but perhaps it has been mentioned in earlier literature as well.

One interesting consequence of these equations is that if the zeroes are real at some time, then they will stay real as long as the zeroes do not collide. Let us now restrict attention to the case of real simple zeroes, in which case we will rename the zeroes as instead of , and order them as . The evolution

can now be thought of as reverse gradient flow for the “entropy”

(which is also essentially the logarithm of the discriminant of the polynomial) since we have

In particular, we have the monotonicity formula

where is the “energy”

where in the last line we use the antisymmetrisation identity

Among other things, this shows that as one goes backwards in time, the entropy decreases, and so no collisions can occur to the past, only in the future, which is of course consistent with the attractive nature of the dynamics. As is a convex function of the positions , one expects to also evolve in a convex manner in time, that is to say the energy should be increasing. This is indeed the case:

Exercise 1Show that

Symmetric polynomials of the zeroes are polynomial functions of the coefficients and should thus evolve in a polynomial fashion. One can compute this explicitly in simple cases. For instance, the center of mass is an invariant:

The variance decreases linearly:

Exercise 2Establish the virial identity

As the variance (which is proportional to ) cannot become negative, this identity shows that “finite time blowup” must occur – that the zeroes must collide at or before the time .

Exercise 3Show that theStieltjes transformsolves the viscous Burgers equation

either by using the original heat equation (2) and the identity , or else by using the equations of motion (3). This relation between the Burgers equation and the heat equation is known as the Cole-Hopf transformation.

The paper of Csordas, Smith, and Varga mentioned previously gives some other bounds on the lifespan of the dynamics; roughly speaking, they show that if there is one pair of zeroes that are much closer to each other than to the other zeroes then they must collide in a short amount of time (unless there is a collision occuring even earlier at some other location). Their argument extends also to situations where there are an infinite number of zeroes, which they apply to get new results on Newman’s conjecture in analytic number theory. I would be curious to know of further places in the literature where this dynamics has been studied.

Joni Teräväinen and I have just uploaded to the arXiv our paper “Odd order cases of the logarithmically averaged Chowla conjecture“, submitted to J. Numb. Thy. Bordeaux. This paper gives an alternate route to one of the main results of our previous paper, and more specifically reproves the asymptotic

for all odd and all integers (that is to say, all the odd order cases of the logarithmically averaged Chowla conjecture). Our previous argument relies heavily on some deep ergodic theory results of Bergelson-Host-Kra, Leibman, and Le (and was applicable to more general multiplicative functions than the Liouville function ); here we give a shorter proof that avoids ergodic theory (but instead requires the Gowers uniformity of the (W-tricked) von Mangoldt function, established in several papers of Ben Green, Tamar Ziegler, and myself). The proof follows the lines sketched in the previous blog post. In principle, due to the avoidance of ergodic theory, the arguments here have a greater chance to be made quantitative; however, at present the known bounds on the Gowers uniformity of the von Mangoldt function are qualitative, except at the level, which is unfortunate since the first non-trivial odd case requires quantitative control on the level. (But it may be possible to make the Gowers uniformity bounds for quantitative if one assumes GRH, although when one puts everything together, the actual decay rate obtained in (1) is likely to be poor.)

Apoorva Khare and I have updated our paper “On the sign patterns of entrywise positivity preservers in fixed dimension“, announced at this post from last month. The quantitative results are now sharpened using a new monotonicity property of ratios of Schur polynomials, namely that such ratios are monotone non-decreasing in each coordinate of if is in the positive orthant, and the partition is larger than that of . (This monotonicity was also independently observed by Rachid Ait-Haddou, using the theory of blossoms.) In the revised version of the paper we give two proofs of this monotonicity. The first relies on a deep positivity result of Lam, Postnikov, and Pylyavskyy, which uses a representation-theoretic positivity result of Haiman to show that the polynomial combination

of skew-Schur polynomials is Schur-positive for any partitions (using the convention that the skew-Schur polynomial vanishes if is not contained in , and where and denotes the pointwise min and max of and respectively). It is fairly easy to derive the monotonicity of from this, by using the expansion

of Schur polynomials into skew-Schur polynomials (as was done in this previous post).

The second proof of monotonicity avoids representation theory by a more elementary argument establishing the weaker claim that the above expression (1) is non-negative on the positive orthant. In fact we prove a more general determinantal log-supermodularity claim which may be of independent interest:

Theorem 1Let be any totally positive matrix (thus, every minor has a non-negative determinant). Then for any -tuples of increasing elements of , one haswhere denotes the minor formed from the rows in and columns in .

For instance, if is the matrix

for some real numbers , one has

(corresponding to the case , ), or

(corresponding to the case , , , , ). It turns out that this claim can be proven relatively easy by an induction argument, relying on the Dodgson and Karlin identities from this previous post; the difficulties are largely notational in nature. Combining this result with the Jacobi-Trudi identity for skew-Schur polynomials (discussed in this previous post) gives the non-negativity of (1); it can also be used to directly establish the monotonicity of ratios by applying the theorem to a generalised Vandermonde matrix.

(Log-supermodularity also arises as the natural hypothesis for the FKG inequality, though I do not know of any interesting application of the FKG inequality in this current setting.)

Suppose we have an matrix that is expressed in block-matrix form as

where is an matrix, is an matrix, is an matrix, and is a matrix for some . If is invertible, we can use the technique of Schur complementation to express the inverse of (if it exists) in terms of the inverse of , and the other components of course. Indeed, to solve the equation

where are column vectors and are column vectors, we can expand this out as a system

Using the invertibility of , we can write the first equation as

and substituting this into the second equation yields

and thus (assuming that is invertible)

and then inserting this back into (1) gives

Comparing this with

we have managed to express the inverse of as

One can consider the inverse problem: given the inverse of , does one have a nice formula for the inverse of the minor ? Trying to recover this directly from (2) looks somewhat messy. However, one can proceed as follows. Let denote the matrix

(with the identity matrix), and let be its transpose:

Then for any scalar (which we identify with times the identity matrix), one has

and hence by (2)

noting that the inverses here will exist for large enough. Taking limits as , we conclude that

On the other hand, by the Woodbury matrix identity (discussed in this previous blog post), we have

and hence on taking limits and comparing with the preceding identity, one has

This achieves the aim of expressing the inverse of the minor in terms of the inverse of the full matrix. Taking traces and rearranging, we conclude in particular that

In the case, this can be simplified to

where is the basis column vector.

We can apply this identity to understand how the spectrum of an random matrix relates to that of its top left minor . Subtracting any complex multiple of the identity from (and hence from ), we can relate the Stieltjes transform of with the Stieltjes transform of :

At this point we begin to proceed informally. Assume for sake of argument that the random matrix is Hermitian, with distribution that is invariant under conjugation by the unitary group ; for instance, could be drawn from the Gaussian Unitary Ensemble (GUE), or alternatively could be of the form for some real diagonal matrix and a unitary matrix drawn randomly from using Haar measure. To fix normalisations we will assume that the eigenvalues of are typically of size . Then is also Hermitian and -invariant. Furthermore, the law of will be the same as the law of , where is now drawn uniformly from the unit sphere (independently of ). Diagonalising into eigenvalues and eigenvectors , we have

One can think of as a random (complex) Gaussian vector, divided by the magnitude of that vector (which, by the Chernoff inequality, will concentrate to ). Thus the coefficients with respect to the orthonormal basis can be thought of as independent (complex) Gaussian vectors, divided by that magnitude. Using this and the Chernoff inequality again, we see (for distance away from the real axis at least) that one has the concentration of measure

and thus

(that is to say, the diagonal entries of are roughly constant). Similarly we have

Inserting this into (5) and discarding terms of size , we thus conclude the approximate relationship

This can be viewed as a difference equation for the Stieltjes transform of top left minors of . Iterating this equation, and formally replacing the difference equation by a differential equation in the large limit, we see that when is large and for some , one expects the top left minor of to have Stieltjes transform

where solves the Burgers-type equation

Example 1If is a constant multiple of the identity, then . One checks that is a steady state solution to (7), which is unsurprising given that all minors of are also times the identity.

Example 2If is GUE normalised so that each entry has variance , then by the semi-circular law (see previous notes) one has (using an appropriate branch of the square root). One can then verify the self-similar solutionto (7), which is consistent with the fact that a top minor of also has the law of GUE, with each entry having variance when .

One can justify the approximation (6) given a sufficiently good well-posedness theory for the equation (7). We will not do so here, but will note that (as with the classical inviscid Burgers equation) the equation can be solved exactly (formally, at least) by the method of characteristics. For any initial position , we consider the characteristic flow formed by solving the ODE

with initial data , ignoring for this discussion the problems of existence and uniqueness. Then from the chain rule, the equation (7) implies that

and thus . Inserting this back into (8) we see that

and thus (7) may be solved implicitly via the equation

Remark 3In practice, the equation (9) may stop working when crosses the real axis, as (7) does not necessarily hold in this region. It is a cute exercise (ultimately coming from the Cauchy-Schwarz inequality) to show that this crossing always happens, for instance if has positive imaginary part then necessarily has negative or zero imaginary part.

Example 4Suppose we have as in Example 1. Then (9) becomesfor any , which after making the change of variables becomes

as in Example 1.

Example 5Suppose we haveas in Example 2. Then (9) becomes

If we write

one can calculate that

and hence

One can recover the spectral measure from the Stieltjes transform as the weak limit of as ; we write this informally as

In this informal notation, we have for instance that

which can be interpreted as the fact that the Cauchy distributions converge weakly to the Dirac mass at as . Similarly, the spectral measure associated to (10) is the semicircular measure .

If we let be the spectral measure associated to , then the curve from to the space of measures is the high-dimensional limit of a Gelfand-Tsetlin pattern (discussed in this previous post), if the pattern is randomly generated amongst all matrices with spectrum asymptotic to as . For instance, if , then the curve is , corresponding to a pattern that is entirely filled with ‘s. If instead is a semicircular distribution, then the pattern is

thus at height from the top, the pattern is semicircular on the interval . The interlacing property of Gelfand-Tsetlin patterns translates to the claim that (resp. ) is non-decreasing (resp. non-increasing) in for any fixed . In principle one should be able to establish these monotonicity claims directly from the PDE (7) or from the implicit solution (9), but it was not clear to me how to do so.

An interesting example of such a limiting Gelfand-Tsetlin pattern occurs when , which corresponds to being , where is an orthogonal projection to a random -dimensional subspace of . Here we have

and so (9) in this case becomes

A tedious calculation then gives the solution

For , there are simple poles at , and the associated measure is

This reflects the interlacing property, which forces of the eigenvalues of the minor to be equal to (resp. ). For , the poles disappear and one just has

For , one has an inverse semicircle distribution

There is presumably a direct geometric explanation of this fact (basically describing the singular values of the product of two random orthogonal projections to half-dimensional subspaces of ), but I do not know of one off-hand.

The evolution of can also be understood using the *-transform* and *-transform* from free probability. Formally, letlet be the inverse of , thus

for all , and then define the -transform

The equation (9) may be rewritten as

and hence

See these previous notes for a discussion of free probability topics such as the -transform.

Example 6If then the transform is .

Example 7If is given by (10), then the transform is

Example 8If is given by (11), then the transform is

This simple relationship (12) is essentially due to Nica and Speicher (thanks to Dima Shylakhtenko for this reference). It has the remarkable consequence that when is the reciprocal of a natural number , then is the free arithmetic mean of copies of , that is to say is the free convolution of copies of , pushed forward by the map . In terms of random matrices, this is asserting that the top minor of a random matrix has spectral measure approximately equal to that of an arithmetic mean of independent copies of , so that the process of taking top left minors is in some sense a continuous analogue of the process of taking freely independent arithmetic means. There ought to be a geometric proof of this assertion, but I do not know of one. In the limit (or ), the -transform becomes linear and the spectral measure becomes semicircular, which is of course consistent with the free central limit theorem.

In a similar vein, if one defines the function

and inverts it to obtain a function with

for all , then the *-transform* is defined by

Writing

for any , , we have

and so (9) becomes

which simplifies to

replacing by we obtain

and thus

and hence

One can compute to be the -transform of the measure ; from the link between -transforms and free products (see e.g. these notes of Guionnet), we conclude that is the free product of and . This is consistent with the random matrix theory interpretation, since is also the spectral measure of , where is the orthogonal projection to the span of the first basis elements, so in particular has spectral measure . If is unitarily invariant then (by a fundamental result of Voiculescu) it is asymptotically freely independent of , so the spectral measure of is asymptotically the free product of that of and of .

Szemerédi’s theorem asserts that all subsets of the natural numbers of positive density contain arbitrarily long arithmetic progressions. *Roth’s theorem* is the special case when one considers arithmetic progressions of length three. Both theorems have many important proofs using tools from additive combinatorics, (higher order) Fourier analysis, (hyper) graph regularity theory, and ergodic theory. However, the original proof by Endre Szemerédi, while extremely intricate, was purely combinatorial (and in particular “elementary”) and almost entirely self-contained, except for an invocation of the van der Waerden theorem. It is also notable for introducing a prototype of what is now known as the Szemerédi regularity lemma.

Back in 2005, I rewrote Szemerédi’s original proof in order to understand it better, however my rewrite ended up being about the same length as the original argument and was probably only usable to myself. In 2012, after Szemerédi was awarded the Abel prize, I revisited this argument with the intention to try to write up a more readable version of the proof, but ended up just presenting some ingredients of the argument in a blog post, rather than try to rewrite the whole thing. In that post, I suspected that the cleanest way to write up the argument would be through the language of nonstandard analysis (perhaps in an iterated hyperextension that could handle various hierarchies of infinitesimals), but was unable to actually achieve any substantial simplifications by passing to the nonstandard world.

A few weeks ago, I participated in a week-long workshop at the American Institute of Mathematics on “Nonstandard methods in combinatorial number theory”, and spent some time in a working group with Shabnam Akhtari, Irfam Alam, Renling Jin, Steven Leth, Karl Mahlburg, Paul Potgieter, and Henry Towsner to try to obtain a manageable nonstandard version of Szemerédi’s original proof. We didn’t end up being able to do so – in fact there are now signs that perhaps nonstandard analysis is not the optimal framework in which to place this argument – but we did at least clarify the existing standard argument, to the point that I was able to go back to my original rewrite of the proof and present it in a more civilised form, which I am now uploading here as an unpublished preprint. There are now a number of simplifications to the proof. Firstly, one no longer needs the full strength of the regularity lemma; only the simpler “weak” regularity lemma of Frieze and Kannan is required. Secondly, the proof has been “factored” into a number of stand-alone propositions of independent interest, in particular involving just (families of) one-dimensional arithmetic progressions rather than the complicated-looking multidimensional arithmetic progressions that occur so frequently in the original argument of Szemerédi. Finally, the delicate manipulations of densities and epsilons via double counting arguments in Szemerédi’s original paper have been abstracted into a certain key property of families of arithmetic progressions that I call the “double counting property”.

The factoring mentioned above is particularly simple in the case of proving Roth’s theorem, which is now presented separately in the above writeup. Roth’s theorem seeks to locate a length three progression in which all three elements lie in a single set. This will be deduced from an easier variant of the theorem in which one locates (a family of) length three progressions in which just the first two elements of the progression lie in a good set (and some other properties of the family are also required). This is in turn derived from an even easier variant in which now just the first element of the progression is required to be in the good set.

More specifically, Roth’s theorem is now deduced from

Theorem 1.5. Let be a natural number, and let be a set of integers of upper density at least . Then, whenever is partitioned into finitely many colour classes, there exists a colour class and a family of 3-term arithmetic progressions with the following properties:

- For each , and lie in .
- For each , lie in .
- The for are in arithmetic progression.

The situation in this theorem is depicted by the following diagram, in which elements of are in blue and elements of are in grey:

Theorem 1.5 is deduced in turn from the following easier variant:

Theorem 1.6. Let be a natural number, and let be a set of integers of upper density at least . Then, whenever is partitioned into finitely many colour classes, there exists a colour class and a family of 3-term arithmetic progressions with the following properties:

- For each , lie in .
- For each , and lie in .
- The for are in arithmetic progression.

The situation here is described by the figure below.

Theorem 1.6 is easy to prove. To derive Theorem 1.5 from Theorem 1.6, or to derive Roth’s theorem from Theorem 1.5, one uses double counting arguments, van der Waerden’s theorem, and the weak regularity lemma, largely as described in this previous blog post; see the writeup for the full details. (I would be interested in seeing a shorter proof of Theorem 1.5 though that did not go through these arguments, and did not use the more powerful theorems of Roth or Szemerédi.)

Fix a non-negative integer . Define an (weak) integer partition of length to be a tuple of non-increasing non-negative integers . (Here our partitions are “weak” in the sense that we allow some parts of the partition to be zero. Henceforth we will omit the modifier “weak”, as we will not need to consider the more usual notion of “strong” partitions.) To each such partition , one can associate a Young diagram consisting of left-justified rows of boxes, with the row containing boxes. A semi-standard Young tableau (or *Young tableau* for short) of shape is a filling of these boxes by integers in that is weakly increasing along rows (moving rightwards) and strictly increasing along columns (moving downwards). The collection of such tableaux will be denoted . The *weight* of a tableau is the tuple , where is the number of occurrences of the integer in the tableau. For instance, if and , an example of a Young tableau of shape would be

The weight here would be .

To each partition one can associate the Schur polynomial on variables , which we will define as

using the multinomial convention

Thus for instance the Young tableau given above would contribute a term to the Schur polynomial . In the case of partitions of the form , the Schur polynomial is just the complete homogeneous symmetric polynomial of degree on variables:

thus for instance

Schur polyomials are ubiquitous in the algebraic combinatorics of “type objects” such as the symmetric group , the general linear group , or the unitary group . For instance, one can view as the character of an irreducible polynomial representation of associated with the partition . However, we will not focus on these interpretations of Schur polynomials in this post.

This definition of Schur polynomials allows for a way to describe the polynomials recursively. If and is a Young tableau of shape , taking values in , one can form a sub-tableau of some shape by removing all the appearances of (which, among other things, necessarily deletes the row). For instance, with as in the previous example, the sub-tableau would be

and the reduced partition in this case is . As Young tableaux are required to be strictly increasing down columns, we can see that the reduced partition must *intersperse* the original partition in the sense that

for all ; we denote this interspersion relation as (though we caution that this is *not* intended to be a partial ordering). In the converse direction, if and is a Young tableau with shape with entries in , one can form a Young tableau with shape and entries in by appending to an entry of in all the boxes that appear in the shape but not the shape. This one-to-one correspondence leads to the recursion

where , , and the size of a partition is defined as .

One can use this recursion (2) to prove some further standard identities for Schur polynomials, such as the determinant identity

for , where denotes the Vandermonde determinant

with the convention that if is negative. Thus for instance

We review the (standard) derivation of these identities via (2) below the fold. Among other things, these identities show that the Schur polynomials are symmetric, which is not immediately obvious from their definition.

One can also iterate (2) to write

where the sum is over all tuples , where each is a partition of length that intersperses the next partition , with set equal to . We will call such a tuple an *integral Gelfand-Tsetlin pattern* based at .

One can generalise (6) by introducing the skew Schur functions

for , whenever is a partition of length and a partition of length for some , thus the Schur polynomial is also the skew Schur polynomial with . (One could relabel the variables here to be something like instead, but this labeling seems slightly more natural, particularly in view of identities such as (8) below.)

By construction, we have the decomposition

whenever , and are partitions of lengths respectively. This gives another recursive way to understand Schur polynomials and skew Schur polynomials. For instance, one can use it to establish the generalised Jacobi-Trudi identity

with the convention that for larger than the length of ; we do this below the fold.

The Schur polynomials (and skew Schur polynomials) are “discretised” (or “quantised”) in the sense that their parameters are required to be integer-valued, and their definition similarly involves summation over a discrete set. It turns out that there are “continuous” (or “classical”) analogues of these functions, in which the parameters now take real values rather than integers, and are defined via integration rather than summation. One can view these continuous analogues as a “semiclassical limit” of their discrete counterparts, in a manner that can be made precise using the machinery of geometric quantisation, but we will not do so here.

The continuous analogues can be defined as follows. Define a *real partition* of length to be a tuple where are now real numbers. We can define the relation of interspersion between a length real partition and a length real partition precisely as before, by requiring that the inequalities (1) hold for all . We can then define the continuous Schur functions for recursively by defining

for and of length , where and the integral is with respect to -dimensional Lebesgue measure, and as before. Thus for instance

and

More generally, we can define the continuous skew Schur functions for of length , of length , and recursively by defining

and

for . Thus for instance

and

By expanding out the recursion, one obtains the analogue

of (6), and more generally one has

We will call the tuples in the first integral *real Gelfand-Tsetlin patterns* based at . The analogue of (8) is then

where the integral is over all real partitions of length , with Lebesgue measure.

By approximating various integrals by their Riemann sums, one can relate the continuous Schur functions to their discrete counterparts by the limiting formula

as for any length real partition and any , where

and

More generally, one has

as for any length real partition , any length real partition with , and any .

As a consequence of these limiting formulae, one expects all of the discrete identities above to have continuous counterparts. This is indeed the case; below the fold we shall prove the discrete and continuous identities in parallel. These are not new results by any means, but I was not able to locate a good place in the literature where they are explicitly written down, so I thought I would try to do so here (primarily for my own internal reference, but perhaps the calculations will be worthwhile to some others also).

The determinant of an matrix (with coefficients in an arbitrary field) obey many useful identities, starting of course with the fundamental multiplicativity for matrices . This multiplicativity can in turn be used to establish many further identities; in particular, as shown in this previous post, it implies the *Schur determinant identity*

whenever is an invertible matrix, is an matrix, is a matrix, and is a matrix. The matrix is known as the Schur complement of the block .

I only recently discovered that this identity in turn immediately implies what I always found to be a somewhat curious identity, namely the Dodgson condensation identity (also known as the *Desnanot-Jacobi identity*)

for any and matrix , where denotes the matrix formed from by removing the row and column, and similarly denotes the matrix formed from by removing the and rows and and columns. Thus for instance when we obtain

for any scalars . (Charles Dodgson, better known by his pen name Lewis Caroll, is of course also known for writing “Alice in Wonderland” and “Through the Looking Glass“.)

The derivation is not new; it is for instance noted explicitly in this paper of Brualdi and Schneider, though I do not know if this is the earliest place in the literature where it can be found. (EDIT: Apoorva Khare has pointed out to me that the original arguments of Dodgson can be interpreted as implicitly following this derivation.) I thought it is worth presenting the short derivation here, though.

Firstly, by swapping the first and rows, and similarly for the columns, it is easy to see that the Dodgson condensation identity is equivalent to the variant

Now write

where is an matrix, are column vectors, are row vectors, and are scalars. If is invertible, we may apply the Schur determinant identity repeatedly to conclude that

and the claim (2) then follows by a brief calculation (and the explicit form of the determinant). To remove the requirement that be invertible, one can use a limiting argument, noting that one can work without loss of generality in an algebraically closed field, and in such a field, the set of invertible matrices is dense in the Zariski topology. (In the case when the scalars are reals or complexes, one can just use density in the ordinary topology instead if desired.)

The same argument gives the more general determinant identity of Sylvester

whenever , is a -element subset of , and denotes the matrix formed from by removing the rows associated to and the columns associated to . (The Dodgson condensation identity is basically the case of this identity.)

A closely related proof of (2) proceeds by elementary row and column operations. Observe that if one adds some multiple of one of the first rows of to one of the last two rows of , then the left and right sides of (2) do not change. If the minor is invertible, this allows one to reduce to the case where the components of the matrix vanish. Similarly, using elementary column operations instead of row operations we may assume that vanish. All matrices involved are now block-diagonal and the identity follows from a routine computation.

The latter approach can also prove the cute identity

for any , any column vectors , and any matrix , which can for instance be found in page 7 of this text of Karlin. Observe that both sides of this identity are unchanged if one adds some multiple of any column of to one of ; for generic , this allows one to reduce to have only the first two entries allowed to be non-zero, at which point the determinants split into and determinants and we can reduce to the case (eliminating the role of ). One can now either proceed by a direct computation, or by observing that the left-hand side is quartilinear in and antisymmetric in and which forces it to be a scalar multiple of , at which point one can test the identity at a single point (e.g. and for the standard basis ) to conclude the argument. (One can also derive this identity from the Sylvester determinant identity but I think the calculations are a little messier if one goes by that route. Conversely, one can recover the Dodgson condensation identity from Karlin’s identity by setting , (for instance) and then permuting some rows and columns.)

## Recent Comments