The celebrated Szemerédi-Trotter theorem gives a bound for the set of incidences ${I(P,L) := \{ (p,\ell) \in P \times L: p \in \ell \}}$ between a finite set of points ${P}$ and a finite set of lines ${L}$ in the Euclidean plane ${{\bf R}^2}$. Specifically, the bound is

$\displaystyle |I(P,L)| \ll |P|^{2/3} |L|^{2/3} + |P| + |L| \ \ \ \ \ (1)$

where we use the asymptotic notation ${X \ll Y}$ or ${X=O(Y)}$ to denote the statement that ${X \leq CY}$ for some absolute constant ${C}$. In particular, the number of incidences between ${n}$ points and ${n}$ lines is ${O(n^{4/3})}$. This bound is sharp; consider for instance the discrete box ${P := \{ (a,b) \in {\bf Z}^2: 1 \leq a \leq N; 1 \leq b \leq 2N^2 \}}$ with ${L}$ being the collection of lines ${\{ (x,mx+b): m, b \in {\bf Z}, 1 \leq m \leq N, 1 \leq b \leq N^2 \}}$. One easily verifies that ${|P|=2N^3}$, ${|L| = N^3}$, and ${|I(P,L)| = N^4}$, showing that (1) is essentially sharp in the case ${|P| \sim |L|}$; one can concoct similar examples for other regimes of ${|P|}$ and ${|L|}$.

On the other hand, if one replaces the Euclidean plane ${{\bf R}^2}$ by a finite field geometry ${F^2}$, where ${F}$ is a finite field, then the estimate (1) is false. For instance, if ${P}$ is the entire plane ${F^2}$, and ${L}$ is the set of all lines in ${F^2}$, then ${|P|, |L|}$ are both comparable to ${|F|^2}$, but ${|I(P,L)|}$ is comparable to ${|F|^3}$, thus violating (1) when ${|F|}$ is large. Thus any proof of the Szemerédi-Trotter theorem must use a special property of the Euclidean plane which is not enjoyed by finite field geometries. In particular, this strongly suggests that one cannot rely purely on algebra and combinatorics to prove (1); one must also use some Euclidean geometry or topology as well.

Nowadays, the slickest proof of the Szemerédi-Trotter theorem is via the crossing number inequality (as discussed in this previous post), which ultimately relies on Euler’s famous formula ${|V|-|E|+|F|=2}$; thus in this argument it is topology which is the feature of Euclidean space which one is exploiting, and which is not present in the finite field setting. Today, though, I would like to mention a different proof (closer in spirit to the original proof of Szemerédi-Trotter, and also a later argument of Clarkson et al.), based on the method of cell decomposition, which has proven to be a very flexible method in combinatorial incidence geometry. Here, the distinctive feature of Euclidean geometry one is exploiting is convexity, which again has no finite field analogue.

Roughly speaking, the idea is this. Using nothing more than the axiom that two points determine at most one line, one can obtain the bound

$\displaystyle |I(P,L)| \ll |P| |L|^{1/2} + |L|, \ \ \ \ \ (2)$

which is inferior to (1). (On the other hand, this estimate works in both Euclidean and finite field geometries, and is sharp in the latter case, as shown by the example given earlier.) Dually, the axiom that two lines determine at most one point gives the bound

$\displaystyle |I(P,L)| \ll |L| |P|^{1/2} + |P| \ \ \ \ \ (3)$

(or alternatively, one can use projective duality to interchange points and lines and deduce (3) from (2)).

An inspection of the proof of (2) shows that it is only expected to be sharp when the bushes ${L_p := \{ \ell \in L: \ell \ni p \}}$ associated to each point ${p \in P}$ behave like “independent” subsets of ${L}$, so that there is no significant correlation between the bush ${L_p}$ of one point and the bush of another point ${L_q}$.

However, in Euclidean space, we have the phenomenon that the bush of a point ${L_p}$ is influenced by the region of space that ${p}$ lies in. Clearly, if ${p}$ lies in a set ${\Omega}$ (e.g. a convex polygon), then the only lines ${\ell \in L}$ that can contribute to ${L_p}$ are those lines which pass through ${\Omega}$. If ${\Omega}$ is a small convex region of space, one expects only a fraction of the lines in ${L}$ to actually pass through ${\Omega}$. As such, if ${p}$ and ${q}$ both lie in ${\Omega}$, then ${L_p}$ and ${L_q}$ are compressed inside a smaller subset of ${L}$, namely the set of lines passing through ${\Omega}$, and so should be more likely to intersect than if they were independent. This should lead to an improvement to (2) (and indeed, as we shall see below, ultimately leads to (1)).

More formally, the argument proceeds by applying the following lemma:

Lemma 1 (Cell decomposition) Let ${L}$ be a finite collection of lines in ${{\bf R}^2}$, let ${P}$ be a finite set of points, and let ${r \geq 1}$. Then it is possible to find a set ${R}$ of ${O(r)}$ lines in ${L}$, plus some additional open line segments not containing any point in ${P}$, which subdivide ${{\bf R}^2}$ into ${O(r^2)}$ convex regions (or cells), such that the interior of each such cell is incident to at most ${O(|L|/r)}$ lines.

The deduction of (1) from (2), (3) and Lemma 1 is very quick. Firstly we may assume we are in the range

$\displaystyle |L|^{1/2} \ll |P| \ll |L|^2 \ \ \ \ \ (4)$

otherwise the bound (1) follows already from either (2) or (3) and some high-school algebra.

Let ${r \geq 1}$ be a parameter to be optimised later. We apply the cell decomposition to subdivide ${{\bf R}^2}$ into ${O(r^2)}$ open convex regions, plus a family ${R}$ of ${O(r)}$ lines. Each of the ${O(r^2)}$ convex regions ${\Omega}$ has only ${O(|L|/r)}$ lines through it, and so by (2) contributes ${O( |P \cap \Omega| |L|^{1/2}/r^{1/2} + |L| / r )}$ incidences. Meanwhile, on each of the lines ${\ell}$ in ${R}$ used to perform this decomposition, there are at most ${|L|}$ transverse incidences (because each line in ${L}$ distinct from ${\ell}$ can intersect ${\ell}$ at most once), plus all the incidences along ${\ell}$ itself. Putting all this together, one obtains

$\displaystyle |I(P,L)| \leq |I(P,L \cap R)| + O( |P| |L|^{1/2}/r^{1/2} + |L| r).$

We optimise this by selecting ${r \sim |P|^{2/3} / |L|^{1/3}}$; from (4) we can ensure that ${r \leq |L|/2}$, so that ${|L \cap R| \leq |L|/2}$. One then obtains

$\displaystyle |I(P,L)| \leq |I(P,L \cap R)| + O( |P|^{2/3} |L|^{2/3} ).$

We can iterate away the ${L \cap R}$ error (halving the number of lines each time) and sum the resulting geometric series to obtain (1).

It remains to prove (1). If one subdivides ${{\bf R}^2}$ using ${r}$ arbitrary lines, one creates at most ${O(r^2)}$ cells (because each new line intersects the existing lines at most once, and so can create at most ${O(r)}$ distinct cells), and for a similar reason, every line in ${L}$ visits at most ${r}$ of these regions, and so by double counting one expects ${O(|L|/r)}$ lines per cell “on the average”. The key difficulty is then to get ${O(|L|/r)}$ lines through every cell, not just on the average. It turns out that a probabilistic argument will almost work, but with a logarithmic loss (thus having ${O( |L| \log |L| / r )}$ lines per cell rather than ${O(|L|/r)}$); but with a little more work one can then iterate away this loss also. The arguments here are loosely based on those of Clarkson et al.; a related (deterministic) decomposition also appears in the original paper of Szemerédi and Trotter. But I wish to focus here on the probabilistic approach.)

It is also worth noting that the original (somewhat complicated) argument of Szemerédi-Trotter has been adapted to establish the analogue of (1) in the complex plane ${{\bf C}^2}$ by Toth, while the other known proofs of Szemerédi-Trotter, so far, have not been able to be extended to this setting (the Euler characteristic argument clearly breaks down, as does any proof based on using lines to divide planes into half-spaces). So all three proofs have their advantages and disadvantages.

— 1. The trivial incidence estimate —

We first give a quick proof of the trivial incidence bound (2). We have

$\displaystyle |I(P,L)| = \sum_{\ell \in L} |P \cap \ell|$

and thus by Cauchy-Schwarz

$\displaystyle \sum_{\ell \in L} |P \cap \ell|^2 \geq \frac{|I(P,L)|^2}{|L|}.$

On the other hand, observe that

$\displaystyle \sum_{\ell \in L} |P \cap \ell|^2 - |P \cap \ell| = | \{ (p,q,\ell) \in P \times P \times L: p \neq q; p,q \in \ell \}.$

Because two distinct points ${p,q}$ are incident to at most one line, the right-hand side is at most ${|P|^2}$, thus

$\displaystyle \sum_{\ell \in L} |P \cap \ell|^2 \leq |I(P,L)| + |P|^2.$

Comparing this with the Cauchy-Schwarz bound and using a little high-school algebra we obtain (2). A dual argument (swapping the role of lines and points) give (3).

A more informal proof of (2) can be given as follows. Suppose for contradiction that ${|I(P,L)|}$ was much larger than ${|P| |L|^{1/2} + |L|}$. Since ${|I(P,L)| = \sum_{p \in P} |L_p|}$, this implies that that the ${|L_p|}$ are much larger than ${|L|^{1/2}}$ on the average. By the birthday paradox, one then expects two randomly chosen ${L_p, L_q}$ to intersect in at least two places ${\ell, \ell'}$; but this would mean that two lines intersect in two points, a contradiction. The use of Cauchy-Schwarz in the rigorous argument given above can thus be viewed as an assertion that the average intersection of ${L_p}$ and ${L_q}$ is at least as large as what random chance predicts.

As mentioned in the introduction, we now see (intuitively, at least) that if nearby ${p, q}$ are such that ${L_p, L_q}$ are drawn from a smaller pool of lines than ${L}$, then their intersection is likely to be higher, and so one should be able to improve upon (2).

— 2. The probabilistic bound —

Now we start proving Lemma 1. We can assume that ${r < |L|}$, since the claim is trivial otherwise (we just use all the lines in ${L}$ to subdivide the plane, and there are no lines left in ${L}$ to intersect any of the cells). Similarly we may assume that ${r > 1}$, and that ${|L|}$ is large. We can also perturb all the lines slightly and assume that the lines are in general position (no three are concurrent), as the general claim then follows from a limiting argument (note that this may send some of the cells to become empty). (Of course, the Szemerédi-Trotter theorem is quite easy under the assumption of general position, but this theorem is not our current objective right now.)

We use the probabilistic method, i.e. we construct ${R}$ by some random recipe and aim to show that the conclusion of the lemma holds with positive probaility.

The most obvious approach would be to choose the ${r}$ lines ${R}$ at random from ${L}$, thus each line ${\ell \in L}$ has a probability of ${r/|L|}$ of lying in ${R}$. Actually, for technical reasons it is slightly better to use a Bernoulli process to select ${R}$, thus each line ${\ell \in L}$ lies in ${R}$ with an independent probability of ${r/|L|}$. This can cause ${R}$ to occasionally have size much larger than ${r}$, but this probability can be easily controlled (e.g. using the Chernoff inequality). So with high probability, ${R}$ consists of ${O(r)}$ lines, which therefore carve out ${O(r^2)}$ cells. The remaining task is to show that each cell is incident to at most ${O(|L|/r)}$ lines from ${L}$.

Observe that each cell is a (possibly unbounded) polygon, whose edges come from lines in ${R}$. Note that (except in the degenerate case when ${R}$ consists of at most one line, which we can ignore) any line ${\ell}$ which meets a cell in ${R}$, must intersect at least one of the edges of ${R}$. If we pretend for the moment that all cells have a bounded number of edges, it would then suffice to show that each edge of each cell was incident to ${O(|L|/r)}$ lines.

Let’s see how this would go. Suppose that one line ${\ell \in L}$ was picked for the set ${R}$, and consider all the other lines in ${L}$ that intersect ${\ell}$; there are ${O(|L|)}$ of these lines ${\ell'}$, which (by the general position hypothesis) intersect ${\ell}$ at distinct points ${\ell \cap \ell'}$ on the line. If one of these lines ${\ell'}$ intersecting ${\ell}$ is also selected for ${R}$, then the corresponding point ${\ell \cap \ell'}$ will become a vertex of one of the cells (indeed, it will be the vertex of four cells). Thus each of these points on ${\ell}$ has an independent probability of ${r/|L|}$ of becoming a vertex for a cell.

Now consider ${m}$ consecutive such points on ${\ell}$. The probability that they all fail to be chosen as cell vertices is ${(1-r/|L|)^m}$; if ${m = k |L|/r}$, then this probability is ${O( \exp( - k ) )}$. Thus runs of much more than ${|L|/r}$ points without vertices are unlikely. In particular, setting ${k = 100 \log |L|}$, we see that the probability that any given ${100 |L| \log |L|/r}$ consecutive points on any given line ${\ell}$ are skipped is ${O( |L|^{-100} )}$. By the union bound, we thus see that with probability ${1-O(|L|^{-98})}$, that every line ${\ell}$ has at most ${O( |L| \log |L|/r )}$ points between any two adjacent vertices. Or in other words, the edge of every cell is incident to at most ${O( |L| \log |L| / r )}$ lines from ${L}$. This yields Lemma 1 except for two things: firstly, the logarthmic loss of ${O( \log |L| )}$, and secondly, the assumption that each cell had only a bounded number of edges.

To fix the latter problem, we will have to add some line segments to ${R}$. First, we randomly rotate the plane so that none of the lines in ${L}$ are vertical. Then we do the following modified construction: we select ${O(r)}$ lines from ${L}$ as before, creating ${O(r^2)}$ cells, some of which may have a very large number of edges. But then for each cell, and each vertex in that cell, we draw a vertical line segment from that vertex (in either the up or down direction) to bisect the cell into two pieces. (If the vertex is on the far left or far right of the cell, we do nothing.) Note that almost surely this construction will avoid hitting any point in ${P}$. Doing this once for each vertex, we see that we have subdivided each of the old cells into a number of new cells, each of which have at most four sides (two vertical sides, and two non-vertical sides). So we have now achieved a bounded number of sides per cell. But what about the number of such cells? Well, each vertex of each cell is responsible for at most two subdivisions of one cell into two, and the number of such vertices is at most ${O(r^2)}$ (as they are formed by intersecting two lines from the original selection of ${O(r)}$ lines together), so the total number of cells is still ${O(r^2)}$.

Is it still true that each edge of each cell is incident to ${O( |L| \log |L| / r )}$ lines in ${L}$? We have already proven this (with high probability) for all the old edges – the ones that were formed from lines in ${L}$. But there are now some new edges, caused by dropping a vertical line segment from the intersection of two lines in ${L}$. But it is not hard to see that one can use much the same argument as before to see that with high probability, each of these line segments is incident to at most ${O( |L| \log |L| / r)}$ lines in ${L}$ as desired.

Finally, we have to get rid of the logarithm. An inspection of the above arguments (and a use of the first moment method) reveals the following refinement: for any ${k \geq 1}$, there are expected to be at most ${O( \exp(-k) r^2 )}$ cells which are incident to more than ${C k |L| / r}$ lines, where ${C}$ is an absolute constant. This is already enough to improve the ${O( |L| \log |L| / r )}$ bound slightly to ${O( |L| \log r / r )}$. But one can do even better by using Lemma 1 as an induction hypothesis, i.e. assume that for any smaller set ${L'}$ of lines with ${|L'| < |L|}$, and any ${r' \geq 1}$, one can partition ${L'}$ into at most ${C_1 (r')^2}$ cells using at most ${C_0 r'}$ lines such that each cell is incident to at most ${C_2 |L'|/r'}$ lines, where ${C_1, C_2, C_3}$ are absolute constants. (When using induction, asymptotic notation becomes quite dangerous to use, and it is much safer to start writing out the constants explicitly. To close the induction, one has to end up with the same constants ${C_0,C_1,C_2}$ as one started with.) For each ${k}$ between ${C_2/C}$ and ${O(\log r)}$ which is a power of two, one can apply the induction hypothesis to all the cells which are incident to between ${C k |L|/r}$ and ${2Ck |L|/r}$ (with ${L'}$ set equal to the lines in ${L}$ incident to this cell, and ${r'}$ set comparable to ${2Ck}$), and sum up (using the fact that ${\sum_k k^2 \exp(-k)}$ converges, especially if ${k}$ is restricted to powers of two) to close the induction if the constants ${C_0, C_1, C_2}$ are chosen properly; we leave the details as an exercise.