You are currently browsing the tag archive for the ‘probabilistic method’ tag.

The celebrated Szemerédi-Trotter theorem gives a bound for the set of incidences {I(P,L) := \{ (p,\ell) \in P \times L: p \in \ell \}} between a finite set of points {P} and a finite set of lines {L} in the Euclidean plane {{\bf R}^2}. Specifically, the bound is

\displaystyle  |I(P,L)| \ll |P|^{2/3} |L|^{2/3} + |P| + |L| \ \ \ \ \ (1)

where we use the asymptotic notation {X \ll Y} or {X=O(Y)} to denote the statement that {X \leq CY} for some absolute constant {C}. In particular, the number of incidences between {n} points and {n} lines is {O(n^{4/3})}. This bound is sharp; consider for instance the discrete box {P := \{ (a,b) \in {\bf Z}^2: 1 \leq a \leq N; 1 \leq b \leq 2N^2 \}} with {L} being the collection of lines {\{ (x,mx+b): m, b \in {\bf Z}, 1 \leq m \leq N, 1 \leq b \leq N^2 \}}. One easily verifies that {|P|=2N^3}, {|L| = N^3}, and {|I(P,L)| = N^4}, showing that (1) is essentially sharp in the case {|P| \sim |L|}; one can concoct similar examples for other regimes of {|P|} and {|L|}.

On the other hand, if one replaces the Euclidean plane {{\bf R}^2} by a finite field geometry {F^2}, where {F} is a finite field, then the estimate (1) is false. For instance, if {P} is the entire plane {F^2}, and {L} is the set of all lines in {F^2}, then {|P|, |L|} are both comparable to {|F|^2}, but {|I(P,L)|} is comparable to {|F|^3}, thus violating (1) when {|F|} is large. Thus any proof of the Szemerédi-Trotter theorem must use a special property of the Euclidean plane which is not enjoyed by finite field geometries. In particular, this strongly suggests that one cannot rely purely on algebra and combinatorics to prove (1); one must also use some Euclidean geometry or topology as well.

Nowadays, the slickest proof of the Szemerédi-Trotter theorem is via the crossing number inequality (as discussed in this previous post), which ultimately relies on Euler’s famous formula {|V|-|E|+|F|=2}; thus in this argument it is topology which is the feature of Euclidean space which one is exploiting, and which is not present in the finite field setting. Today, though, I would like to mention a different proof (closer in spirit to the original proof of Szemerédi-Trotter, and also a later argument of Clarkson et al.), based on the method of cell decomposition, which has proven to be a very flexible method in combinatorial incidence geometry. Here, the distinctive feature of Euclidean geometry one is exploiting is convexity, which again has no finite field analogue.

Roughly speaking, the idea is this. Using nothing more than the axiom that two points determine at most one line, one can obtain the bound

\displaystyle  |I(P,L)| \ll |P| |L|^{1/2} + |L|, \ \ \ \ \ (2)

which is inferior to (1). (On the other hand, this estimate works in both Euclidean and finite field geometries, and is sharp in the latter case, as shown by the example given earlier.) Dually, the axiom that two lines determine at most one point gives the bound

\displaystyle  |I(P,L)| \ll |L| |P|^{1/2} + |P| \ \ \ \ \ (3)

(or alternatively, one can use projective duality to interchange points and lines and deduce (3) from (2)).

An inspection of the proof of (2) shows that it is only expected to be sharp when the bushes {L_p := \{ \ell \in L: \ell \ni p \}} associated to each point {p \in P} behave like “independent” subsets of {L}, so that there is no significant correlation between the bush {L_p} of one point and the bush of another point {L_q}.

However, in Euclidean space, we have the phenomenon that the bush of a point {L_p} is influenced by the region of space that {p} lies in. Clearly, if {p} lies in a set {\Omega} (e.g. a convex polygon), then the only lines {\ell \in L} that can contribute to {L_p} are those lines which pass through {\Omega}. If {\Omega} is a small convex region of space, one expects only a fraction of the lines in {L} to actually pass through {\Omega}. As such, if {p} and {q} both lie in {\Omega}, then {L_p} and {L_q} are compressed inside a smaller subset of {L}, namely the set of lines passing through {\Omega}, and so should be more likely to intersect than if they were independent. This should lead to an improvement to (2) (and indeed, as we shall see below, ultimately leads to (1)).

More formally, the argument proceeds by applying the following lemma:

Lemma 1 (Cell decomposition) Let {L} be a finite collection of lines in {{\bf R}^2}, let {P} be a finite set of points, and let {r \geq 1}. Then it is possible to find a set {R} of {O(r)} lines in {L}, plus some additional open line segments not containing any point in {P}, which subdivide {{\bf R}^2} into {O(r^2)} convex regions (or cells), such that the interior of each such cell is incident to at most {O(|L|/r)} lines.

The deduction of (1) from (2), (3) and Lemma 1 is very quick. Firstly we may assume we are in the range

\displaystyle  |L|^{1/2} \ll |P| \ll |L|^2 \ \ \ \ \ (4)

otherwise the bound (1) follows already from either (2) or (3) and some high-school algebra.

Let {r \geq 1} be a parameter to be optimised later. We apply the cell decomposition to subdivide {{\bf R}^2} into {O(r^2)} open convex regions, plus a family {R} of {O(r)} lines. Each of the {O(r^2)} convex regions {\Omega} has only {O(|L|/r)} lines through it, and so by (2) contributes {O( |P \cap \Omega| |L|^{1/2}/r^{1/2} + |L| / r )} incidences. Meanwhile, on each of the lines {\ell} in {R} used to perform this decomposition, there are at most {|L|} transverse incidences (because each line in {L} distinct from {\ell} can intersect {\ell} at most once), plus all the incidences along {\ell} itself. Putting all this together, one obtains

\displaystyle  |I(P,L)| \leq |I(P,L \cap R)| + O( |P| |L|^{1/2}/r^{1/2} + |L| r).

We optimise this by selecting {r \sim |P|^{2/3} / |L|^{1/3}}; from (4) we can ensure that {r \leq |L|/2}, so that {|L \cap R| \leq |L|/2}. One then obtains

\displaystyle  |I(P,L)| \leq |I(P,L \cap R)| + O( |P|^{2/3} |L|^{2/3} ).

We can iterate away the {L \cap R} error (halving the number of lines each time) and sum the resulting geometric series to obtain (1).

It remains to prove (1). If one subdivides {{\bf R}^2} using {r} arbitrary lines, one creates at most {O(r^2)} cells (because each new line intersects the existing lines at most once, and so can create at most {O(r)} distinct cells), and for a similar reason, every line in {L} visits at most {r} of these regions, and so by double counting one expects {O(|L|/r)} lines per cell “on the average”. The key difficulty is then to get {O(|L|/r)} lines through every cell, not just on the average. It turns out that a probabilistic argument will almost work, but with a logarithmic loss (thus having {O( |L| \log |L| / r )} lines per cell rather than {O(|L|/r)}); but with a little more work one can then iterate away this loss also. The arguments here are loosely based on those of Clarkson et al.; a related (deterministic) decomposition also appears in the original paper of Szemerédi and Trotter. But I wish to focus here on the probabilistic approach.)

It is also worth noting that the original (somewhat complicated) argument of Szemerédi-Trotter has been adapted to establish the analogue of (1) in the complex plane {{\bf C}^2} by Toth, while the other known proofs of Szemerédi-Trotter, so far, have not been able to be extended to this setting (the Euler characteristic argument clearly breaks down, as does any proof based on using lines to divide planes into half-spaces). So all three proofs have their advantages and disadvantages.

Read the rest of this entry »