You are currently browsing the tag archive for the ‘additive combinatorics’ tag.
Additive combinatorics is largely focused on the additive properties of finite subsets A of an additive group . This group can be finite or infinite, but there is a very convenient trick, the Ruzsa projection trick, which allows one to reduce the latter case to the former. For instance, consider the set inside the integers . The integers of course form an infinite group, but if we are only interested in sums of at most two elements of A at a time, we can embed A ininside the finite cyclic group without losing any combinatorial information. More precisely, there is a Freiman isomorphism of order 2 between the set in and the set in . One can view the latter version of as a model for the former version of . More generally, it turns out that any finite set A in an additive group can be modeled in the above set by an equivalent set in a finite group, and in fact one can ensure that this ambient modeling group is not much larger than A itself if A has some additive structure; see this paper of Ruzsa (or Lemma 5.26 of my book with Van Vu) for a precise statement. This projection trick has a number of important uses in additive combinatorics, most notably in Ruzsa’s simplified proof of Freiman’s theorem.
Given the interest in non-commutative analogues of Freiman’s theorem, it is natural to ask whether one can similarly model finite sets A in multiplicative (and non-commutative) groups using finite models. Unfortunately (as I learned recently from Akshay Venkatesh, via Ben Green), this turns out to be impossible in general, due to an old example of Higman. More precisely, Higman shows:
Theorem 1. There exists an infinite group G generated by four distinct elements a,b,c,d that obey the relations
in fact, a and c generate the free group in G. On the other hand, if G’ is a finite group containing four elements a,b,c,d obeying (1), then a,b,c,d are all trivial.
As a consequence, the finite set in G has no model (in the sense of Freiman isomorphisms) in a finite group.
Theorem 1 is proven by a small amount of elementary group theory and number theory, and it was neat enough that I thought I would reproduce it here.
I’ve uploaded a new paper to the arXiv entitled “The sum-product phenomenon in arbitrary rings“, and submitted to Contributions to Discrete Mathematics. The sum-product phenomenon asserts, very roughly speaking, that given a finite non-empty set A in a ring R, then either the sum set or the product set will be significantly larger than A, unless A is somehow very close to being a subring of R, or if A is highly degenerate (for instance, containing a lot of zero divisors). For instance, in the case of the integers , which has no non-trivial finite subrings, a long-standing conjecture of Erdös and Szemerédi asserts that for every finite non-empty and every . (The current best result on this problem is a very recent result of Solymosi, who shows that the conjecture holds for any greater than 2/3.) In recent years, many other special rings have been studied intensively, most notably finite fields and cyclic groups, but also the complex numbers, quaternions, and other division algebras, and continuous counterparts in which A is now (for instance) a collection of intervals on the real line. I will not try to summarise all the work on sum-product estimates and their applications (which range from number theory to graph theory to ergodic theory to computer science) here, but I discuss this in the introduction to my paper, which has over 50 references to this literature (and I am probably still missing out on a few).
I was recently asked the question as to what could be said about the sum-product phenomenon in an arbitrary ring R, which need not be commutative or contain a multiplicative identity. Once one makes some assumptions to avoid the degenerate case when A (or related sets, such as A-A) are full of zero-divisors, it turns out that there is in fact quite a bit one can say, using only elementary methods from additive combinatorics (in particular, the Plünnecke-Ruzsa sum set theory). Roughly speaking, the main results of the paper assert that in an arbitrary ring, a set A which is non-degenerate and has small sum set and product set must be mostly contained inside a subring of R of comparable size to A, or a dilate of such a subring, though in the absence of invertible elements one sometimes has to enlarge the ambient ring R slightly before one can find the subring. At the end of the paper I specialise these results to specific rings, such as division algebras or products of division algebras, cyclic groups, or finite-dimensional algebras over fields. Generally speaking, the methods here give very good results when the set of zero divisors is sparse and easily describable, but poor results otherwise. (In particular, the sum-product theory in cyclic groups, as worked out by Bourgain and coauthors, is only recovered for groups which are the product of a bounded number of primes; the case of cyclic groups whose order has many factors seems to require a more multi-scale analysis which I did not attempt to perform in this paper.)
Read the rest of this entry »
This week I am visiting the University of Washington in Seattle, giving the Milliman Lecture Series for 2007-2008. My chosen theme here is “Recent developments in arithmetic combinatorics“. In my first lecture, I will speak (once again) on how methods in additive combinatorics have allowed us to detect additive patterns in the prime numbers, in particular discussing my joint work with Ben Green. In the second lecture I will discuss how additive combinatorics has made it possible to study the invertibility and spectral behaviour of random discrete matrices, in particular discussing my joint work with Van Vu; and in the third lecture I will discuss how sum-product estimates have recently led to progress in the theory of expanders relating to Lie groups, as well as to sieving over orbits of such groups, in particular presenting work of Jean Bourgain and his coauthors.
Recently, I had tentatively announced a forthcoming result with Ben Green establishing the “Gowers inverse conjecture” (or more accurately, the “inverse conjecture for the Gowers uniformity norm”) for vector spaces over a finite field , in the special case when p=2 and when the function for which the inverse conjecture is to be applied is assumed to be a polynomial phase of bounded degree (thus , where is a polynomial of some degree ). See my FOCS article for some further discussion of this conjecture, which has applications to both polynomiality testing and to various structural decompositions involving the Gowers norm.
This conjecture can be informally stated as follows. By iterating the obvious fact that the derivative of a polynomial of degree at most d is a polynomial of degree at most d-1, we see that a function is a polynomial of degree at most d if and only if
for all . From this one can deduce that a function bounded in magnitude by 1 is a polynomial phase of degree at most d if and only if the Gowers norm
is equal to its maximal value of 1. The inverse conjecture for the Gowers norm, in its usual formulation, says that, more generally, if a function bounded in magnitude by 1 has large Gowers norm (e.g. ) then f has some non-trivial correlation with some polynomial phase g (e.g. for some ). Informally, this conjecture asserts that if a function has biased derivatives, then one should be able to “integrate” this bias and conclude that the function is biased relative to a polynomial of degree d. The conjecture has already been proven for . There are analogues of this conjecture for cyclic groups which are of relevance to Szemerédi’s theorem and to counting linear patterns in primes, but I will not discuss those here.
At the time of the announcement, our paper had not quite been fully written up. This turned out to be a little unfortunate, because soon afterwards we discovered that our arguments at one point had to go through a version of Newton’s interpolation formula, which involves a factor of d! in the denominator and so is only valid when the characteristic p of the field exceeds the degree. So our arguments in fact are only valid in the range , and in particular are rather trivial in the important case ; my previous announcement should thus be amended accordingly.
Today I’d like to discuss a beautiful inequality in graph theory, namely the crossing number inequality. This inequality gives a useful bound on how far a given graph is from being planar, and has a number of applications, for instance to sum-product estimates. Its proof is also an excellent example of the amplification trick in action; here the main source of amplification is the freedom to pass to subobjects, which is a freedom which I didn’t touch upon in the previous post on amplification. The crossing number inequality (and its proof) is well known among graph theorists but perhaps not among the wider mathematical community, so I thought I would publicise it here.
In this post, when I talk about a graph, I mean an abstract collection of vertices V, together with some abstract edges E joining pairs of vertices together. We will assume that the graph is undirected (the edges do not have a preferred orientation), loop-free (an edge cannot begin and start at the same vertex), and multiplicity-free (any pair of vertices is joined by at most one edge). More formally, we can model all this by viewing E as a subset of , the set of 2-element subsets of V, and we view the graph G as an ordered pair G = (V,E). (The notation is set up so that .)
Now one of the great features of graphs, as opposed to some other abstract maths concepts, is that they are easy to draw: the abstract vertices become dots on a plane, while the edges become line segments or curves connecting these dots. [To avoid some technicalities we do not allow these curves to pass through the dots, except if the curve is terminating at that dot.] Let us informally refer to such a concrete representation D of a graph G as a drawing of that graph. Clearly, any non-trivial graph is going to have an infinite number of possible drawings. In some of these drawings, a pair of edges might cross each other; in other drawings, all edges might be disjoint (except of course at the vertices, where edges with a common endpoint are obliged to meet). If G has a drawing D of the latter type, we say that the graph G is planar.
Given an abstract graph G, or a drawing thereof, it is not always obvious as to whether that graph is planar; just because the drawing that you currently possess of G contains crossings, does not necessarily mean that all drawings of G do. The wonderful little web game “Planarity” illustrates this point excellently. Nevertheless, there are definitely graphs which are not planar; in particular the complete graph on five vertices, and the complete bipartite graph on two sets of three vertices, are non-planar.
There is in fact a famous theorem of Kuratowski that says that these two graphs are the only “source” of non-planarity, in the sense that any non-planar graph contains (a subdivision of) one of these graphs as a subgraph. (There is of course the even more famous four-colour theorem that asserts that every planar graphs is four-colourable, but this is not the topic of my post today.)
Intuitively, if we fix the number of vertices |V|, and increase the number of edges |E|, then the graph should become “increasingly non-planar”; conversely, if we keep the same number of edges |E| but spread them amongst a greater number of vertices |V|, then the graph should become “increasingly planar”. Is there a quantitative way to measure the “non-planarity” of a graph, and to formalise the above intuition as some sort of inequality?
Read the rest of this entry »
It occurred to me recently that the mathematical blog medium may be a good venue not just for expository “short stories” on mathematical concepts or results, but also for more technical discussions of individual mathematical “tricks”, which would otherwise not be significant enough to warrant a publication-length (and publication-quality) article. So I thought today that I would discuss the amplification trick in harmonic analysis and combinatorics (and in particular, in the study of estimates); this trick takes an established estimate involving an arbitrary object (such as a function f), and obtains a stronger (or amplified) estimate by transforming the object in a well-chosen manner (often involving some new parameters) into a new object, applying the estimate to that new object, and seeing what that estimate says about the original object (after optimising the parameters or taking a limit). The amplification trick works particularly well for estimates which enjoy some sort of symmetry on one side of the estimate that is not represented on the other side; indeed, it can be viewed as a way to “arbitrage” differing amounts of symmetry between the left- and right-hand sides of an estimate. It can also be used in the contrapositive, amplifying a weak counterexample to an estimate into a strong counterexample. This trick also sheds some light as to why dimensional analysis works; an estimate which is not dimensionally consistent can often be amplified into a stronger estimate which is dimensionally consistent; in many cases, this new estimate is so strong that it cannot in fact be true, and thus dimensionally inconsistent inequalities tend to be either false or inefficient, which is why we rarely see them. (More generally, any inequality on which a group acts on either the left or right-hand side can often be “decomposed” into the “isotypic components” of the group action, either by the amplification trick or by other related tools, such as Fourier analysis.)
The amplification trick is a deceptively simple one, but it can become particularly powerful when one is arbitraging an unintuitive symmetry, such as symmetry under tensor powers. Indeed, the “tensor power trick”, which can eliminate constants and even logarithms in an almost magical manner, can lead to some interesting proofs of sharp inequalities, which are difficult to establish by more direct means.
The most familiar example of the amplification trick in action is probably the textbook proof of the Cauchy-Schwarz inequality
for vectors v, w in a complex Hilbert space. To prove this inequality, one might start by exploiting the obvious inequality
but after expanding everything out, one only gets the weaker inequality
Now (3) is weaker than (1) for two reasons; the left-hand side is smaller, and the right-hand side is larger (thanks to the arithmetic mean-geometric mean inequality). However, we can amplify (3) by arbitraging some symmetry imbalances. Firstly, observe that the phase rotation symmetry preserves the RHS of (3) but not the LHS. We exploit this by replacing v by in (3) for some phase to be chosen later, to obtain
Now we are free to choose at will (as long as it is real, of course), so it is natural to choose to optimise the inequality, which in this case means to make the left-hand side as large as possible. This is achieved by choosing to cancel the phase of , and we obtain
This is closer to (1); we have fixed the left-hand side, but the right-hand side is still too weak. But we can amplify further, by exploiting an imbalance in a different symmetry, namely the homogenisation symmetry for a scalar , which preserves the left-hand side but not the right. Inserting this transform into (4) we conclude that
where is at our disposal to choose. We can optimise in by minimising the right-hand side, and indeed one easily sees that the minimum (or infimum, if one of v and w vanishes) is (which is achieved when when are non-zero, or in an asymptotic limit or in the degenerate cases), and so we have amplified our way to the Cauchy-Schwarz inequality (1). [See also this discussion by Tim Gowers on the Cauchy-Schwarz inequality.]
Van Vu and I have recently uploaded our joint paper, “Random matrices: the circular law“, submitted to Contributions to Discrete Mathematics. In this paper we come close to fully resolving the circular law conjecture regarding the eigenvalue distribution of random matrices, for arbitrary choices of coefficient distribution.
More precisely, suppose we have an matrix for some large n, where each coefficient of is an independent identically distributed copy of a single random variable x (possibly complex-valued). x could be continuous (e.g. a Gaussian) or discrete (e.g. a Bernoulli random variable, taking values +1 and -1 with equal probability). For simplicity, let us normalise x to have mean 0 and variance 1 (in particular, the second moment is finite). This matrix will not be self-adjoint or normal, but we still expect it to be diagonalisable, with n complex eigenvalues. Heuristic arguments suggest that these eigenvalues should mostly have magnitude ; for instance, one can see this by observing that the Hilbert-Schmidt norm (a.k.a. the Frobenius norm) , which can be shown to dominate the sum of squares of the magnitudes of the eigenvalues, is of size comparable to on the average. Because of this, it is customary to normalise the matrix by ; thus let be the n complex eigenvalues of , arranged in any order.
Numerical evidence (as seen for instance here) soon reveals that these n eigenvalues appear to distribute themselves uniformly in the unit circle in the limit . This phenomenon is known as the circular law. It can be made more precise; if we define the empirical spectral distribution to be the function
then with probability 1, should converge uniformly to the uniform distribution of the unit circle, defined as
This statement is known as the circular law conjecture. In the case when x is a complex Gaussian, this law was verified by Mehta (using an explicit formula of Ginibre for the joint density function of the eigenvalues in this case). A strategy for attacking the general case was then formulated by Girko, although a fully rigorous execution of that strategy was first achieved by Bai (and then improved slightly by Bai and Silverstein). They established the circular law under the assumption that x had slightly better than bounded second moment (i.e. for some ), but more importantly that the probability density function of x in the complex plane was bounded (in particular, this ruled out all discrete random variables, such as the Bernoulli random variable). The reason for this latter restriction was in order to control the event that the matrix (or more precisely for various complex numbers z) becomes too ill-conditioned by having a very small least singular value.
In the last few years, work of Rudelson, of myself with Van Vu, and of Rudelson-Vershynin (building upon earlier work of Kahn, Komlos, and Szemerédi, and of Van and myself), have opened the way to control the condition number of random matrices even when the matrices are discrete, and so there have been a recent string of results using these techniques to extend the circular law to discrete settings. In particular, Gotze and Tikhomirov established the circular law for discrete random variables which were sub-Gaussian, which was then relaxed by Pan and Zhou to an assumption of bounded fourth moment. In our paper, we get very close to the ideal assumption of bounded second moment; we need for some . (The power of 16 in the logarithm can certainly be improved, though our methods do not allow the logarithm to be removed entirely.)
The main new difficulty that arises when relaxing the moment condition so close to the optimal one is that one begins to lose control on the largest singular value of , i.e. on the operator norm of . Under high moment assumptions (e.g. fourth moment) one can keep this operator norm bounded with reasonable probability (especially after truncating away some exceptionally large elements), but when the moment conditions are loosened, one can only bound this operator norm by a quantity bounded polynomially in n, even after truncation. This in turn causes certain metric entropy computations to become significantly more delicate, as one has to reduce the scale of the net below what one would ordinarily like to have.
One of the oldest and most fundamental concepts in mathematics is the line. Depending on exactly what mathematical structures we want to study (algebraic, geometric, topological, order-theoretic, etc.), we model lines nowadays by a variety of standard mathematical objects, such as the real line , the complex line , the projective line , the extended real line , the affine line , the continuum , the long line , etc. We also have discrete versions of the line, such as the natural numbers , the integers , and the ordinal , as well as compact versions of the line, such as the unit interval or the unit circle . Finally we have discrete and compact versions of the line, such as the cyclic groups and the discrete intervals and . By taking Cartesian products we then obtain higher-dimensional objects such as Euclidean space , the standard lattice , the standard torus , and so forth. These objects of course form the background on which a very large fraction of modern mathematics is set.
Broadly speaking, the line has three major families of structures on it:
- Geometric structures, such as a metric or a measure, completeness, scales (coarse and fine), rigid motions (translations and reflection), similarities (dilation, affine maps), and differential structures (tangent bundle, etc.);
- Algebraic structures, such group, ring, or field structures, and everything else that comes from those categories (e.g. subgroups, homomorphisms, involutions, etc.); and
- One-dimensional structures, such as order, a length space structure (in particular, path-connectedness structure), a singleton generator, the Archimedean property, the ability to use mathematical induction (i.e. well-ordering), convexity, or the ability to disconnect the line by removing a single point.
Of course, these structures are inter-related, and it is an important phenomenon that a mathematical concept which appears to be native to one structure, can often be equivalently defined in terms of other structures. For instance, the absolute value of an integer can be defined geometrically as the distance from 0 to , algebraically as the index of the subgroup of the integers generated by n, or one-dimensionally as the number of integers between 0 and (including 0, but excluding ). This equivalence of definitions becomes important when one wants to work in more general contexts in which one or more of the above structures is missing or otherwise weakened.
What I want to talk about today is an important toy model for the line (in any of its incarnations), in which the geometric and algebraic structures are enhanced (and become neatly nested and recursive), at the expense of the one-dimensional structure (which is largely destroyed). This model has many different names, depending on what field of mathematics one is working in and which structures one is interested in. In harmonic analysis it is called the dyadic model, the Walsh model, or the Cantor group model; in number theory and arithmetic geometry it is known as the function field model; in topology it is the Cantor space model; in probability it is the martingale model; in metric geometry it is the ultrametric, tree, or non-Archimedean model; in algebraic geometry it is the Puiseux series model; in additive combinatorics it is the bounded torsion or finite field model; in computer science and information theory it is the Hamming cube model; in representation theory it is the Kashiwara crystal model. Let me arbitrarily select one of these terms, and refer to all of these models as dyadic models for the line (or of objects derived from the line). While there is often no direct link between a dyadic model and a non-dyadic model, dyadic models serve as incredibly useful laboratories in which to gain insight and intuition for the “real-world” non-dyadic model, since one has much more powerful and elegant algebraic and geometric structure to play with in this setting (though the loss of one-dimensional structure can be a significant concern). Perhaps the most striking example of this is the three-line proof of the Riemann hypothesis in the function field model of the integers, which I will discuss a little later.
This is a well-known problem in multilinear harmonic analysis; it is fascinating to me because it lies barely beyond the reach of the best technology we have for these problems (namely, multiscale time-frequency analysis), and because the most recent developments in quadratic Fourier analysis seem likely to shed some light on this problem.
Recall that the Hilbert transform is defined on test functions (up to irrelevant constants) as
where the integral is evaluated in the principal value sense (removing the region to ensure integrability, and then taking the limit as .)