Combinatorial incidence geometry is the study of the possible combinatorial configurations between geometric objects such as lines and circles. One of the basic open problems in the subject has been the Erdős distance problem, posed in 1946:
Problem 1 (Erdős distance problem) Let be a large natural number. What is the least number of distances that are determined by points in the plane?
Erdős called this least number . For instance, one can check that and , although the precise computation of rapidly becomes more difficult after this. By considing points in arithmetic progression, we see that . By considering the slightly more sophisticated example of a lattice grid (assuming that is a square number for simplicity), and using some analytic number theory, one can obtain the slightly better asymptotic bound .
On the other hand, lower bounds are more difficult to obtain. As observed by Erdős, an easy argument, ultimately based on the incidence geometry fact that any two circles intersect in at most two points, gives the lower bound . The exponent has been slowly increasing over the years by a series of increasingly intricate arguments combining incidence geometry facts with other known results in combinatorial incidence geometry (most notably the Szemerédi-Trotter theorem) and also some tools from additive combinatorics; however, these methods seemed to fall quite short of getting to the optimal exponent of . (Indeed, previously to last week, the best lower bound known was approximately , due to Katz and Tardos.)
Very recently, though, Guth and Katz have obtained a near-optimal result:
The proof neatly combines together several powerful and modern tools in a new way: a recent geometric reformulation of the problem due to Elekes and Sharir; the polynomial method as used recently by Dvir, Guth, and Guth-Katz on related incidence geometry problems (and discussed previously on this blog); and the somewhat older method of cell decomposition (also discussed on this blog). A key new insight is that the polynomial method (and more specifically, the polynomial Ham Sandwich theorem, also discussed previously on this blog) can be used to efficiently create cells.
In this post, I thought I would sketch some of the key ideas used in the proof, though I will not give the full argument here (the paper itself is largely self-contained, well motivated, and of only moderate length). In particular I will not go through all the various cases of configuration types that one has to deal with in the full argument, but only some illustrative special cases.
To simplify the exposition, I will repeatedly rely on “pigeonholing cheats”. A typical such cheat: if I have objects (e.g. points or lines), each of which could be of one of two types, I will assume that either all of the objects are of the first type, or all of the objects are of the second type. (In truth, I can only assume that at least of the objects are of the first type, or at least of the objects are of the second type; but in practice, having instead of only ends up costing an unimportant multiplicative constant in the type of estimates used here.) A related such cheat: if one has objects (again, think of points or circles), and to each object one can associate some natural number (e.g. some sort of “multiplicity” for ) that is of “polynomial size” (of size ), then I will assume in fact that all the are in a fixed dyadic range for some . (In practice, the dyadic pigeonhole principle can only achieve this after throwing away all but about of the original objects; it is this type of logarithmic loss that eventually leads to the logarithmic factor in the main theorem.) Using the notation to denote the assertion that for an absolute constant , we thus have for all , thus is morally constant.
I will also use asymptotic notation rather loosely, to avoid cluttering the exposition with a certain amount of routine but tedious bookkeeping of constants. In particular, I will use the informal notation or to denote the statement that is “much less than” or is “much larger than” , by some large constant factor.
— 1. Reduction to a linear problem —
Traditionally, the Erdős distance problem has been attacked by first casting it as a question about incidences between circles, by starting with the trivial observation that if two points are equidistant from a third point , then lie on a circle centred at .
The incidence geometry of circles, however, is not quite as well understood as the incidence geometry of lines, and so one often then converted the circle incidence problem to a line incidence problem, for instance by using the elementary Euclidean geometry fact that if two circles intersected at a pair of points, then the centres of these circles would lie on the perpendicular bisector of that pair of points. Indeed, by combining this elementary observation with the celebrated Szemerédi-Trotter theorem that gives a sharp incidence bound for a collection of lines and points, Chung, Szemerédi, and Trotter were already able to obtain the respectable lower bound of .
The first innovation of Guth and Katz (which builds upon earlier work in this direction by Elekes and Sharir) is to use elementary Euclidean geometry to recast the distance problem as a linear problem in a slightly different fashion, namely as an incidence problem of lines in three dimensions rather than two.
Let’s see how this works. To prove the main theorem, we will argue by contradiction, assuming that for some large . Thus, we can find points in the plane which only subtend distances. We think of these distances as the lengths of line segments connecting two of the points . There are different line segments that have these lengths. A standard application of the Cauchy-Schwarz inequality then shows that there must be many pairs of distinct but congruent line segments , (i.e. line segments of equal length ); in fact, their number must be
So, we have a lot () of pairs of congruent line segments. Now we use a trivial but fundamentally important observation of Elekes and Sharir to recast the problem in terms of rigid motions:
Proposition 3 Two line segments in the plane are congruent if and only if there is an orientation-preserving rigid motion that maps to and to . Furthermore, this rigid motion is unique.
Proof: Translate to and then rotate appropriately.
Let us call the space of orientation-preserving rigid motions (the two-dimensional special Euclidean group); it is a three-dimensional Lie group. From the above proposition we thus see that we can find quintuples , where is such that and .
We now dualise the problem; rather than think about rigid motions acting on pairs of points, we think about pairs of points describing a set of rigid motions. For any pair of points , let of all rigid motions that map to . This is a one-dimensional subset of ; indeed, consists of the translation from to , together with rotations around an origin that lies on the perpendicular bisector of , with a rotation angle obeying the relation
where is the midpoint of and . If we discard the translation (which can easily be dealt with by a separate argument) as well as the point reflection (corresponding to the case ), and focus only on the rotations by angles less than , we in fact see that the origin of the rotation depends in a linear fashion on the quantity . Thus, if we parameterise (most of) by the coordinates , we thus see that we can view each as a line in , and we will adopt this perspective for the rest of this post.
[I do not know if there is any higher explanation (coming, say, from Lie group theory) as to why the set of rigid motions from to in happens to form a line in when viewed in the appropriate coordinate system. Perhaps this is just a miraculous coincidence arising from the low-dimensional (and low-degree) nature of the problem.]
Let be the collection of all the lines generated by pairs of points from our collection, thus there are such lines. (It is easy to see that different pairs of points lead to different lines.) The quintuples described earlier give rise to intersecting pairs of lines in . So the question now comes down to a simple question about incidences of lines: is it possible for lines in three-dimensional space to generate pairs of intersecting lines? If the answer to this question is “no”, then we are done.
Unfortunately, the answer to the question is “yes”. One quickly comes up with two basic counterexamples to that question:
- (Concurrency) If one has lines that all go through the same point , then we will have pairs of intersecting lines.
- (Coplanarity) If one has lines that all lie in the same plane , with no two lines being parallel, then we will have pairs of intersecting lines.
Slightly less obviously, there is a third counterexample that comes from a regulus (or hyperbolic paraboloid) – a doubly ruled surface in . A typical example of a regulus is the set . On the one hand, this surface is ruled by the lines for ; on the other hand, it is also ruled by the lines for . If we then pick lines from the first family of lines and from the second family, then we obtain another family of lines that lead to pairs of intersecting lines.
However, all is not lost here, because we are not dealing with an arbitrary family of lines in , but rather with a special family that was generated ultimately by points in . As such, the special structure of this family can be used to rule out the concurrency, coplanarity, and regulus counterexmaples. Indeed, observe that if a rigid motion maps to , then it cannot also map to for some . This implies that and cannot intersect. This limits the amount of concurrency in ; letting run from to , we indeed conclude that at most lines in can meet at a point, which eliminates the concurrency counterexample. (Now, the best one can do is get groups of concurrent lines, but this only yields pairs of intersecting lines, which is acceptable.)
In a similar spirit, observe that for a given plane , the requirement that a line in lie in that plane is a codimension two constraint on that line (the space of lines in a plane is two-dimensional, while the space of lines in is four-dimensional). Thus, for fixed , the requirement that lie in is a codimension two constraint on , and thus by Bezout’s theorem, it should only be satisfiable by values of . (One has to check that the constraints are non-degenerate, but this is routine; actually, in this particular case one can also argue directly using elementary Euclidean geometry.) Letting run from to , we conclude that any plane can contain at most lines from , thus neutralising the coplanarity counterexample. The same argument also shows that any regulus also can contain at most lines from (because a regulus is an algebraic surface of degree , and so the Bezout argument still applies).
Now, it turns out that the elimination of the concurrency, coplanarity, and regulus obstructions allows us to now give the right answer to the previous question. Indeed, Guth and Katz show
Theorem 4 Let be a collection of lines in , such that at most lines in meet at a point, and such that any plane or regulus contains at most lines in . Then there are at most pairs of intersecting lines in .
The above discussion has shown (modulo a few details) that Theorem 4 implies Theorem 2, so it suffices now to show Theorem 4. This is a significant simplification, because it is an assertion about incidences of lines, rather than congruence of line segments or incidences of circles, and we know a lot more about how configurations of lines intersect than we do about configurations of circles or of congruences of line segments.
Of course, it remains to prove Theorem 4. It is convenient to pigeonhole based on the concurrency of the lines in . Let , and suppose that has a set of points , such that for each point in there are at least lines in passing through . From the concurrency bound we know that . Then each point in contributes pairs of intersecting lines, so we need to show that does not get much larger than . And indeed this is the case:
A simple dyadic summation argument allows one to deduce Theorem 4 from Theorem 5. (Note that the case is not relevant, as such points do not contribute any pairs of intersecting lines.) The hypothesis that each plane or regulus contains at most lines, incidentally, is very reminiscent of the Wolff axiom in the work on the Kakeya conjecture.
It is worth noting that the bound in Theorem 5 is completely sharp. Indeed, consider two parallel grids
and let be the set of lines connecting a point in the first grid to a point in the second grid. Then one can verify that obeys the required hypotheses for Theorem 5, and a number-theoretic calculation shows that for any , the number of points in that are incident to at least lines in is . (To see this, first look at the cases and , and then interpolate the arguments; details are given in the Guth-Katz paper.)
It is also worth noting that if one replaces the real line by a finite field , then the claim fails for large . Indeed, if one sets and to be the set of all lines in , then the hypotheses of Theorem 5 hold, but each of the points in is incident to lines in , which soon leads to a significant violation of the conclusion of that theorem. Thus, any proof of Theorem 5 in the large case cannot be purely algebraic in nature must somehow use a property of the real line that is not shared by finite fields. This property will be the ordered nature of the real line, which manifests itself in the Szemerédi-Trotter theorem and in the ham sandwich theorem, both of which are used in the proof. However, this example does not prevent the small case from being purely algebraic.
So, the only thing left to do is to prove Theorem 5. It turns out that the multiplicity case and the higher multiplicity case have to be treated separately, basically because concurrent lines will be trapped in a plane when , but will usually not be so trapped when . Also, note that the regulus obstruction only generates intersections of multiplicity ; when , the hypothesis that a regulus contains points is not used and so can be dropped.
— 2. The case —
We first verify Theorem 5 in the case, i.e. we show that the lines in that theorem can intersect in at most points. As hinted in the previous discussion, the argument here is purely algebraic in nature, using just the polynomial method (similar to, say, how Dvir proved the finite fields Kakeya conjecture, as discussed in this previous blog post). As such, the result in fact holds over an arbitrary field, but we will stick to for sake of notational consistency.
The polynomial method rests on two basic facts:
- Given a set of points in , there is a non-trivial polynomial of degree that vanishes at all of these points simultaneously.
- If a polynomial of degree at most vanishes on more than points of a line , then it must in fact vanish on all of .
Both facts are easy to prove (see e.g. this previous blog post). But, as we will see, they have to be applied carefully in order to obtain a good estimate.
Suppose for sake of contradiction that we can find a set of points in of cardinality such that each point in is incident to at least two lines from . The strategy is then to use Fact 1 find a low-degree polynomial that vanishes on , so that is contained in the low-degree surface . This surface is not necessarily irreducible; it may be the union of several irreducible subsurfaces. By the nature of , there will be lots of lines in that intersect this surface quite frequently; by Fact 2, this should force the lines to be trapped inside . Thus we have a low-degree surface that contains a lot of lines. Hopefully, this forces many of the irreducible components of to be singly ruled surfaces, doubly ruled surfaces, or planes. The latter two cases cannot contain too many lines in by hypothesis, so we expect the dominant family of lines in to be those coming from the singly ruled surfaces. But one can use Bezout-type theorems to control the number of times lines from one singly ruled surface of controlled degree can intersect lines from another such surface, and hopefully this gives the desired bound for .
Let’s now execute this strategy. Fact 1 gives a non-trivial polynomial of degree that vanishes at all the points in . This turns out to be too large of a degree bound to close the argument. The problem is that the set of points that we want to apply this fact to is not completely arbitrary; in fact, by its nature, many points in are going to be collinear (because each point in is incident to at least two lines from ), and so heuristically many of these points in are in fact “redundant” for the purposes of finding a polynomial that vanishes on all of .
We can handle this by the “random refinement” trick. Let us write for some . Each of these points is incident to two lines in ; as there are lines in , we thus expect each line in to be incident to points in . (Actually, what could happen more generally is that only a fraction of the lines in , say lines for some , are contributing the bulk of the incidences, with each line being incident to about points in ; but let us consider the case here for simplicity, as it is the worst case, and the other cases are similar but require a bit more bookkeeping.)
Now, let us destroy some of this collinearity by taking a random subset of of density about . Then will have size about , and each line in is now expected to be incident to elements of . We apply Fact 1 to get a non-trivial polynomial of degree that vanishes at every point in . Applying Fact 2 (and assuming that certain constants were chosen properly), this implies that also vanishes at every line in , and thus also at every point in . This should be contrasted with what would have happened if we applied Fact 1 to directly, giving a degree bound of rather than .
So now we have a surface of degree that contains all the lines in . We split this surface into irreducible components. These components may be planes, reguli, singly ruled surfaces, or not ruled at all. To simplify things let us just consider four extreme cases:
- consists of the union of planes.
- consists of the union of reguli.
- consists of the union of singly ruled surfaces.
- consists of the union of non-ruled surfaces.
(For the full proof, one also has to consider hybrid cases in which more than one of the above four types appear, and so there are some cross-terms to deal with, but these are relatively easy to control and will not be discussed here.)
In the first case, degree considerations tell us that there are at most planes, and by hypothesis each of them contains lines in . Within each plane, there are at most points of intersection, and between any two planes , all points of intersection must lie in the common line , which is incident to the lines of in in at most points. Adding this all together we get at most points of intersection, which is acceptable.
A similar argument (which we omit) handles the second case, so we move on to the third case. Let’s assume for simplicity that each singly ruled surface has the same degree ; since the total degree of is , we must then have at most such surfaces. In a singly ruled surface , all but of the lines in will come from the single ruling of (i.e. all but of the lines will be generators); the contribution of these exceptional lines is negligible (as there at most such lines in all, leading to points of intersection) and we shall simply ignore them. There are two remaining types of intersection to consider; the intersections between two lines from the same ruling of a single ruled surface, and the intersections from lines that do not lie in the same ruled surface.
For the first type of intersection, one can show that in a ruled surface of degree , any line in the ruling can intersect the other lines in the ruling in at most points. So the total number of intersections arising in this way is ; since , this is acceptable.
To handle the second type of intersection, observe that any line not incident to a ruled surface can intersect at most times; summing over the surfaces and the lines in we obtain at most incidences, as desired.
Finally, we consider the fourth case, where there are no ruled surfaces anywhere in . Here, one uses an algebraic observation of Cayley that classifies ruled surfaces by a bounded number of algebraic conditions:
Proposition 6 A surface is ruled if and only if, for every point in , there exists a line through that is tangent to to order three (thus, if is a defining function for , there exists a non-zero vector such that ).
The “only if” direction is obvious, but what we need here is the more difficult “if” operation. The basic idea is to show (e.g. by using the Picard uniqueness theorem for ODE) that the foliation induced by the tangent lines consist entirely of straight lines, thus giving the desired ruling of . The precise order of vanishing is not required for this argument; any bounded number instead of three would suffice here. (I would be interested to know that if there was a slick, computation-free proof of the above proposition, perhaps using a larger order of vanishing than three.)
The condition that there exists a non-zero vector such that can be combined by elementary elimination theory (or high school algebra, for that matter) into a single algebraic condition on , where is a polynomial of degree known as the flecnode polynomial of . If contains no ruled surfaces, then the above proposition tells us that the flecnode polynomial shares no common factors with .
On the other hand, if is a line in oriented in the direction , and is a point in , then lies in , and thus . Thus we see that each line in is contained in the zero locus of both and its flecnode polynomial . However, by applying Bezout’s theorem to a generic two-dimensional slice of , we see that the zero locus of and of can only intersect in lines at most. But if we chose the constants correctly, this number will be less than the number of lines in , leading to the required contradiction.
— 3. The case —
Now we turn to the case. Our starting point here will be the Szemerédi-Trotter theorem, which asserts that given a finite set of points and a finite set of lines, the number of incidences is bounded by
The theorem is usually stated in the plane , but it automatically extends to higher dimensions, and in particular to , by a random projection argument. This theorem can be proven by crossing number methods (as discussed in this previous blog post) or by cell decomposition (as discussed in this previous blog post). In both cases, the order structure of (and in particular, the fact that a line divides a plane into two half-planes) is crucial. But we will not need to know the proof of this theorem, instead using it as a black box.
If one applies this bound directly to our situation, we obtain the bound , which is inferior to the desired bound . Actually, it is not surprising that we get an inferior bound, because we have not yet exploited the crucial non-coplanarity hypothesis.
However, it is possible to amplify the Szemerédi-Trotter bound by the method of cell decomposition. As discussed in this previous blog post, the idea is to carefully carve up the ambient space (which, in this case, is ) into smaller regions or “cells”, each of which only contains a small fraction of the points and which encounters only a small fraction of the lines . One then applies an existing bound (in this case, (1)) to each cell, and sums up over all cells to get what should presumably be a better bound (where the key point being that the cells should be constructed in such a way that each line only encounters a small fraction of the cells).
There is however a catch to this method: if one creates too many cells, then one starts running into the problem that too many points in and too many lines in will now lie on the boundary of the cell, rather than in the interior. So one has to carefully optimise the complexity of the cell decomposition.
In previous literature, the most popular way to create cells was by a random construction, for instance using randomly chosen points and lines from and to create planes, which one then uses to slice up space into polytope cells, possibly with an additional non-random “cleanup” stage to remove some degeneracies or other potential difficulties in the cell structure. One of the key innovations in the Guth-Katz paper is to instead create cells via the polynomial method, and specifically by the polynomial Ham Sandwich theorem. This allows for a very efficient and even cell decomposition; the walls of the cell will no longer be flat planes, but will still be algebraic sets of controlled degree, and this turns out to be good enough for the application at hand. This is inspired by previous applications of the polynomial ham sandwich theorem to incidence geometry problems, as discussed in this previous blog post.
Let us first recall the polynomial ham sandwich theorem (for three dimensions), which one can think of as a continuous version of Fact 1 from the previous section:
Theorem 7 (Polynomial ham sandwich theorem) Let be bounded open sets in . Then there exists a non-trivial polynomial of degree such that the algebraic set bisects each of the , thus and have volume equal to half that of .
By taking the to be the -neighbourhood of finite sets , and sending to zero (using some basic compactness properties of the (projective) space of polynomials to extract a limit), one can conclude a useful combinatorial corollary:
Corollary 8 (Discretised polynomial ham sandwich theorem) Let be finite sets of points in . Then there exists a non-trivial polynomial of degree such that the algebraic set bisects each of the , in the sense that and each have cardinality at most .
Note that the algebraic set may capture some of the points of ; indeed, this is necessarily the case if is odd. This possibility will need to be addressed later in the argument.
We can iterate this corollary to obtain a nice cell decomposition:
Corollary 9 (Cell decomposition) Let be a finite set of points in , and let . Then there exists a non-trivial polynomial of degree and a decomposition of into at most cells , each of which is an open set with boundary in , and each of which contains at most points of . (We allow to be disconnected.)
Proof: Without loss of generality we may take to be a power of two. The proposition is trivial for , and by using Corollary 8 it is easy to see (with the right choice of implied constants) that the claim for implies the claim for (with the same implied constants), by bisecting each of the cells obtained for by an additional bisecting polynomial that one then multiplies with the existing polynomial.
Note that Fact 1 from the previous section can be viewed as the case of the above decomposition, in which the cell walls have now absorbed all the points in . But for us, the optimal value of will be significantly less than , to balance the contribution of the points on the cell walls with the points in the interior.
We now apply this situation to the situation in Theorem 5. Fix , and suppose for contradiction that we can find a set of points with , with each point in incident to at least lines in .
We will apply the cell decomposition for a certain parameter ; it turns out that the optimal value here is . Thus we obtain a non-trivial polynomial of degree , and a collection of cells, each of which contains points of in the interior.
By throwing away at most half of the points in , we can end up in one of two extreme cases:
- (Cellular case) All points in are in the interior of a cell.
- (Algebraic case) All points in are in the boundary of a cell.
The cellular case becomes easier for large, while the algebraic case becomes easier for small; the choice is used to balance the two cases.
Let us first consider the cellular case. Then we cells will contain points of . Each such point in such a cell is incident to at least lines from . Applying the Szemerédi-Trotter bound (1) inside this cell , we conclude that
where is the set of lines in that intersect . Since , we conclude that
On the other hand, as has degree , we see from Bezout’s theorem that each line in intersects cells, and so
Since the number of cells here is , we obtain
which is a contradiction.
Now consider the algebraic case. We have points, each incident to lines from , which has cardinality . We thus expect each line to be incident to points in . For simplicity we will assume that every line is of this type (in reality, we would have to throw out some lines that do not intersect enough points in ).
By construction, all the points in lie in the algebraic set , which has degree . Applying Fact 2, we conclude that every line in is in fact trapped inside .
This is now a situation fairly similar to that of the previous section. However, we now use the hypothesis that to obtain a better bound. Every point in has at least three lines of passing through it, each of which lies in . This leads to two possibilities for :
- (Singular case) is a singular point of (i.e. ).
- (Flat case) is a flat point of (i.e. the is non-singular, but the second fundamental form of vanishes at ).
Indeed, if was non-singular, then has a unique tangent plane at along which at least three lines of are tangent; as they all lie in , this forces the second fundamental form of to vanish.
By throwing out at most half of the points in , we may now reduce to two subcases:
- (Singular subcase) All points of are singular points of .
- (Flat subcase) All points of are flat points of .
Let us consider first the singular subcase. The points are now contained in the zero locus of and of ; the latter is an intersection of three algebraic surfaces of degree ; by reducing to be square-free, we can assume that and have no common factor. We already saw from Fact 2 that all the lines in were trapped in the zero locus of ; the same argument shows that they are also trapped in the zero locus of . But by Bezout’s theorem applied to a generic two-dimensional slice of (as was done in the previous section, using instead of ), we see that the zero locus of and can intersect in at most lines, which will contradict the bound if the constants are chosen properly.
Now we turn to the flat subcase. Just as the three polynomial components of detects singular points of , there are nine polynomials that detect flat points of , namely the components of the three-dimensional vector fields
for . All nine polynomials vanish at a flat point. (This observation was also used in a previous paper of Guth and Katz in which they solve the joints problem.)
The argument for the singular subcase almost carries over without difficulty to the flat subcase, but there is one problem: it is not necessarily the case that does not share a common factor with the , so that we cannot quite apply the Bezout argument immediately. Indeed, could contain a plane, which of course consists entirely of flat points. However, if has no planes, then the set of flat points has positive codimension, and the argument proceeds as before. At the other extreme, if consists entirely of planes, then by degree considerations there are at most planes present. But by hypothesis, each plane contains at most lines from . Since , this leads to a contradiction (if the implied constants are chosen correctly). The general case (in which has some planes, but does not consist entirely of planes) can be established by combining the previous two arguments properly.