In the theory of dense graphs on {n} vertices, where {n} is large, a fundamental role is played by the Szemerédi regularity lemma:

Lemma 1 (Regularity lemma, standard version) Let {G = (V,E)} be a graph on {n} vertices, and let {\epsilon > 0} and {k_0 \geq 0}. Then there exists a partition of the vertices {V = V_1 \cup \ldots \cup V_k}, with {k_0 \leq k \leq C(k_0,\epsilon)} bounded below by {k_0} and above by a quantity {C(k_0,\epsilon)} depending only on {k_0, \epsilon}, obeying the following properties:

  • (Equitable partition) For any {1 \leq i,j \leq k}, the cardinalities {|V_i|, |V_j|} of {V_i} and {V_j} differ by at most {1}.
  • (Regularity) For all but at most {\epsilon k^2} pairs {1 \leq i < j \leq k}, the portion of the graph {G} between {V_i} and {V_j} is {\epsilon}-regular in the sense that one has

    \displaystyle  |d( A, B ) - d( V_i, V_j )| \leq \epsilon

    for any {A \subset V_i} and {B \subset V_j} with {|A| \geq \epsilon |V_i|, |B| \geq \epsilon |V_j|}, where {d(A,B) := |E \cap (A \times B)|/|A| |B|} is the density of edges between {A} and {B}.

This lemma becomes useful in the regime when {n} is very large compared to {k_0} or {1/\epsilon}, because all the conclusions of the lemma are uniform in {n}. Very roughly speaking, it says that “up to errors of size {\epsilon}“, a large graph can be more or less described completely by a bounded number of quantities {d(V_i, V_j)}. This can be interpreted as saying that the space of all graphs is totally bounded (and hence precompact) in a suitable metric space, thus allowing one to take formal limits of sequences (or subsequences) of graphs; see for instance this paper of Lovasz and Szegedy for a discussion.

For various technical reasons it is easier to work with a slightly weaker version of the lemma, which allows for the cells {V_1,\ldots,V_k} to have unequal sizes:

Lemma 2 (Regularity lemma, weighted version) Let {G = (V,E)} be a graph on {n} vertices, and let {\epsilon > 0}. Then there exists a partition of the vertices {V = V_1 \cup \ldots \cup V_k}, with {1 \leq k \leq C(\epsilon)} bounded above by a quantity {C(\epsilon)} depending only on {\epsilon}, obeying the following properties:

  • (Regularity) One has

    \displaystyle  \sum_{(V_i,V_j) \hbox{ not } \epsilon-\hbox{regular}} |V_i| |V_j| = O(\epsilon |V|^2) \ \ \ \ \ (1)

    where the sum is over all pairs {1 \leq i \leq j \leq k} for which {G} is not {\epsilon}-regular between {V_i} and {V_j}.

While Lemma 2 is, strictly speaking, weaker than Lemma 1 in that it does not enforce the equitable size property between the atoms, in practice it seems that the two lemmas are roughly of equal utility; most of the combinatorial consequences of Lemma 1 can also be proven using Lemma 2. The point is that one always has to remember to weight each cell {V_i} by its density {|V_i|/|V|}, rather than by giving each cell an equal weight as in Lemma 1. Lemma 2 also has the advantage that one can easily generalise the result from finite vertex sets {V} to other probability spaces (for instance, one could weight {V} with something other than the uniform distribution). For applications to hypergraph regularity, it turns out to be slightly more convenient to have two partitions (coarse and fine) rather than just one; see for instance my own paper on this topic. In any event the arguments below that we give to prove Lemma 2 can be modified to give a proof of Lemma 1 also. The proof of the regularity lemma is usually conducted by a greedy algorithm. Very roughly speaking, one starts with the trivial partition of {V}. If this partition already regularises the graph, we are done; if not, this means that there are some sets {A} and {B} in which there is a significant density fluctuation beyond what has already been detected by the original partition. One then adds these sets to the partition and iterates the argument. Every time a new density fluctuation is incorporated into the partition that models the original graph, this increases a certain “index” or “energy” of the partition. On the other hand, this energy remains bounded no matter how complex the partition, so eventually one must reach a long “energy plateau” in which no further refinement is possible, at which point one can find the regular partition.

One disadvantage of the greedy algorithm is that it is not efficient in the limit {n \rightarrow \infty}, as it requires one to search over all pairs of subsets {A, B} of a given pair {V_i, V_j} of cells, which is an exponentially long search. There are more algorithmically efficient ways to regularise, for instance a polynomial time algorithm was given by Alon, Duke, Lefmann, Rödl, and Yuster. However, one can do even better, if one is willing to (a) allow cells of unequal size, (b) allow a small probability of failure, (c) have the ability to sample vertices from {G} at random, and (d) allow for the cells to be defined “implicitly” (via their relationships with a fixed set of reference vertices) rather than “explicitly” (as a list of vertices). In that case, one can regularise a graph in a number of operations bounded in {n}. Indeed, one has

Lemma 3 (Regularity lemma via random neighbourhoods) Let {\epsilon > 0}. Then there exists integers {M_1,\ldots,M_m} with the following property: whenever {G = (V,E)} be a graph on finitely many vertices, if one selects one of the integers {M_r} at random from {M_1,\ldots,M_m}, then selects {M_r} vertices {v_1,\ldots,v_{M_r} \in V} uniformly from {V} at random, then the {2^{M_r}} vertex cells {V^{M_r}_1,\ldots,V^{M_r}_{2^{M_r}}} (some of which can be empty) generated by the vertex neighbourhoods {A_t := \{ v \in V: (v,v_t) \in E \}} for {1 \leq t \leq M_r}, will obey the conclusions of Lemma 2 with probability at least {1-O(\epsilon)}.

Thus, roughly speaking, one can regularise a graph simply by taking a large number of random vertex neighbourhoods, and using the partition (or Venn diagram) generated by these neighbourhoods as the partition. The intuition is that if there is any non-uniformity in the graph (e.g. if the graph exhibits bipartite behaviour), this will bias the random neighbourhoods to seek out the partitions that would regularise that non-uniformity (e.g. vertex neighbourhoods would begin to fill out the two vertex cells associated to the bipartite property); if one takes sufficiently many such random neighbourhoods, the probability that all detectable non-uniformity is captured by the partition should converge to {1}. (It is more complicated than this, because the finer one makes the partition, the finer the types of non-uniformity one can begin to detect, but this is the basic idea.)

This fact seems to be reasonably well-known folklore, discovered independently by many authors; it is for instance quite close to the graph property testing results of Alon and Shapira, and also appears implicitly in a paper of Ishigami, as well as a paper of Austin (and perhaps even more implicitly in a paper of myself). However, in none of these papers is the above lemma stated explicitly. I was asked about this lemma recently, so I decided to provide a proof here.

— 1. Warmup: a weak regularity lemma —

To motivate the idea, let’s first prove a weaker but simpler (and more quantitatively effective) regularity lemma, analogous to that established by Frieze and Kannan:

Lemma 4 (Weak regularity lemma via random neighbourhoods) Let {\epsilon > 0}. Then there exists an integer {M} with the following property: whenever {G = (V,E)} be a graph on finitely many vertices, if one selects {1 \leq t \leq M} at random, then selects {t} vertices {v_1,\ldots,v_t \in V} uniformly from {V} at random, then the {2^{t}} vertex cells {V^t_1,\ldots,V^t_{2^t}} (some of which can be empty) generated by the vertex neighbourhoods {A_{t'} := \{ v \in V: (v,v_{t'}) \in E \}} for {1 \leq t' \leq t}, obey the following property with probability at least {1-O(\epsilon)}: for any vertex sets {A, B \subset V}, the number of edges {|E \cap (A \times B)|} connecting {A} and {B} can be approximated by the formula

\displaystyle  |E \cap (A \times B)| = \sum_{i=1}^{2^t} \sum_{j=1}^{2^t} d(V^t_i,V^t_j) |A \cap V^t_i| |B \cap V^t_j| + O( \epsilon |V|^2 ). \ \ \ \ \ (2)

This weaker lemma only lets us count “macroscopic” edge densities {d(A,B)}, when {A, B} are dense subsets of {V}, whereas the full regularity lemma is stronger in that it also controls “microscopic” edge densities {d(A,B)} where {A, B} are now dense subsets of the cells {V^{M_r}_i, V^{M_r}_j}. Nevertheless this weaker lemma is easier to prove and already illustrates many of the ideas.

Let’s now prove this lemma. Fix {\epsilon > 0}, let {M} be chosen later, let {G = (V,E)} be a graph, and select {v_1,\ldots,v_M} at random. (There can of course be many vertices selected more than once; this will not bother us.) Let {A_t} and {V^t_1,\ldots,V^t_{2^t}} be as in the above lemma. For notational purposes it is more convenient to work with the (random) {\sigma}-algebra {{\mathcal B}_t} generated by the {A_1,\ldots,A_t} (i.e. the collection of all sets that can be formed from {A_1,\ldots,A_t} by boolean operations); this is an atomic {\sigma}-algebra whose atoms are precisely the (non-empty) cells {V^t_1,\ldots,V^t_{2^t}} in the partition. Observe that these {\sigma}-algebras are nested: {{\mathcal B}_t \subset {\mathcal B}_{t+1}}.

We will use the trick of turning sets into functions, and view the graph as a function {1_E: V \times V \rightarrow {\mathbb R}}. One can then form the conditional expectation {{\Bbb E}(1_E | {\mathcal B}_t \times {\mathcal B}_t ): V \times V \rightarrow {\mathbb R}} of this function to the product {\sigma}-algebra {{\mathcal B}_t \times {\mathcal B}_t}, whose value on {V^t_i \times V^t_j} is simply the average value of {1_E} on the product set {V^t_i \times V^t_j}. (When {i} and {j} are different, this is simply the edge density {d(V^t_i, V^t_j)}). One can view {{\Bbb E}(1_E | {\mathcal B}_t \times {\mathcal B}_t )} more combinatorially, as a weighted graph on {V} such that all edges between two distinct cells {V^t_i}, {V^t_j} have the same constant weight of {d(V^t_i,V^t_j)}.

We give {V} (and {V \times V}) the uniform probability measure, and define the energy {e_t} at time {t} to be the (random) quantity

\displaystyle  e_t := \|{\Bbb E}(1_E | {\mathcal B}_t \times {\mathcal B}_t ) \|_{L^2(V \times V)}^2 = \frac{1}{|V|^2} \sum_{v,w \in V} {\Bbb E}(1_E | {\mathcal B}_t \times {\mathcal B}_t )^2.

one can interpret this as the mean square of the edge densities {d(V^t_i, V^t_j)}, weighted by the size of the cells {V^t_i, V^t_j}. From Pythagoras’ theorem we have the identity

\displaystyle  e_{t'} = e_t + \| {\Bbb E}(1_E | {\mathcal B}_{t'} \times {\mathcal B}_{t'} ) - {\Bbb E}(1_E | {\mathcal B}_t \times {\mathcal B}_t ) \|_{L^2(V \times V)}^2

for all {t'>t}; in particular, the {e_t} are increasing in {t}. This implies that the expectations {{\Bbb E} e_t} are also increasing in {t}. On the other hand, these expectations are bounded between {0} and {1}. Thus, if we select {1 \leq t \leq M} at random, expectation of

\displaystyle  {\Bbb E} (e_{t+2} - e_{t})

telescopes to be {O(1/M)}. Thus, by Markov’s inequality, with probability {1-O(\epsilon)} we can freeze {v_1,\ldots,v_t} such that we have the conditional expectation bound

\displaystyle  {\Bbb E}(e_{t+2} - e_t| v_1,\ldots,v_t) = O( \frac{1}{M\epsilon} ). \ \ \ \ \ (3)

Suppose {v_1,\ldots,v_t} have this property. We split

\displaystyle  1_E = f_{U^\perp} + f_U


\displaystyle  f_{U^\perp} := {\Bbb E}(1_E | {\mathcal B}_t \times {\mathcal B}_t )


\displaystyle  f_U := 1_E - {\Bbb E}(1_E | {\mathcal B}_{t} \times {\mathcal B}_{t} ).

We now assert that the partition {V^{t}_1,\ldots,V^{t}_{2^t}} induced by {{\mathcal B}_{t}} obeys the conclusions of Lemma 3. For this, we observe various properties on the two components of {1_E}:

Lemma 5 ({f_{U^\perp}} is structured) {f_{U^\perp}} is constant on each product set {V^{t}_i \times V^{t}_j}.

Proof: This is clear from construction. \Box

Lemma 6 ({f_U} is pseudorandom) The expression

\displaystyle  \frac{1}{|V|^4} \sum_{v,w,v',w' \in V} f_U(v,w) f_U(v,w') f_U(v',w) f_U(v',w')

is of size {O( \frac{1}{\sqrt{M \epsilon}} )}.

Proof: The left-hand side can be rewritten as

\displaystyle  {\Bbb E} \frac{1}{|V|^2} \sum_{v,w \in V} f_U(v,w) f_U(v,v_{t+2}) f_U(v_{t+1},w) f_U(v_{t+1},v_{t+2}).

Observe that the function {(v,w) \mapsto f_U(v,v_{t+2}) f_U(v_{t+1},w) f_U(v_{t+1},v_{t+2})} is measurable with respect to {{\mathcal B}_{t+2} \times {\mathcal B}_{t+2}}, so we can rewrite this expression as

\displaystyle  {\Bbb E} \frac{1}{|V|^2} \sum_{v,w \in V} {\Bbb E}(f_U|{\mathcal B}_{t+2} \times {\mathcal B}_{t+2})(v,w) f_U(v,v_{t+2}) f_U(v_{t+1},w) f_U(v_{t+1},v_{t+2}).

Applying Cauchy-Schwarz, one can bound this by

\displaystyle  {\Bbb E} \|{\Bbb E}(f_U|{\mathcal B}_{t+2} \times {\mathcal B}_{t+2})\|_{L^2(V \times V)}.

But from Pythagoras we have

\displaystyle  {\Bbb E}(f_U|{\mathcal B}_{t+2} \times {\mathcal B}_{t+2})^2 = e_{t+2} - e_t

and so the claim follows from (3) and another application of Cauchy-Schwarz. \Box

Now we can prove Lemma 4. Observe that

\displaystyle  |E \cap (A \times B)| - \sum_{i=1}^{2^t} \sum_{j=1}^{2^t} d(V^t_i,V^t_j) |A \cap V^t_i| |B \cap V^t_j|

\displaystyle = \sum_{v,w \in V} 1_A(v) 1_B(w) f_U(v,w).

Applying Cauchy-Schwarz twice in {v, w} and using Lemma 6, we see that the RHS is {O( (M\epsilon)^{-1/8} )}; choosing {M \gg \epsilon^{-9}} we obtain the claim.

— 2. Strong regularity via random neighbourhoods —

We now prove Lemma 3, which of course implies Lemma 2.

Fix {\epsilon > 0} and a graph {G = (V,E)} on {n} vertices. We randomly select an infinite sequence {v_1, v_2, \ldots \in V} of vertices in {V}, drawn uniformly and independently at random. We define {A_t, V^t_i, {\mathcal B}_t}, {e_t}, as before.

Now let {m} be a large number depending on {\varepsilon > 0} to be chosen later, let {F: {\mathbb Z}^+ \rightarrow {\mathbb Z}^+} be a rapidly growing function (also to be chosen later), and set {M_1 := F(1)} and {M_r := 2(M_{r-1} + F(M_{r-1}))} for all {1 \leq r \leq m}, thus {M_1 < M_2 < \ldots < M_{m+1}} grows rapidly to infinity. The expected energies {{\Bbb E} e_{M_r}} are increasing from {0} to {1}, thus if we pick {1 \leq r \leq m} uniformly at random, the expectation of

\displaystyle  {\Bbb E} e_{M_{r+1}} - e_{M_r}

telescopes to be {O(1/m)}. Thus, by Markov’s inequality, with probability {1-O(\epsilon)} we will have

\displaystyle  {\Bbb E} e_{M_{r+1}} - e_{M_r} = O( \frac{1}{m\epsilon} ).

Assume that {r} is chosen to obey this. Then, by another application of the pigeonhole principle, we can find {M_{r+1}/2 \leq t < M_{r+1}} such that

\displaystyle  {\Bbb E} (e_{t+2} - e_{t}) = O( \frac{1}{m\epsilon M_{r+1}} ) = O( \frac{1}{m\epsilon F(M_r)} ).

Fix this {t}. We have

\displaystyle  {\Bbb E} (e_{t} - e_{M_r}) = O( \frac{1}{m\epsilon} ),

so by Markov’s inequality, with probability {1-O(\epsilon)}, {v_1,\ldots,v_t} are such that

\displaystyle  e_{t} - e_{M_r} = O( \frac{1}{m\epsilon^2} ) \ \ \ \ \ (4)

and also obey the conditional expectation bound

\displaystyle  {\Bbb E}(e_{t+2} - e_t | v_1,\ldots,v_t) = O( \frac{1}{m\epsilon F(M_r)} ). \ \ \ \ \ (5)

Assume that this is the case. We split

\displaystyle  1_E = f_{U^\perp} + f_{err} + f_U


\displaystyle  f_{U^\perp} := {\Bbb E}(1_E | {\mathcal B}_{M_r} \times {\mathcal B}_{M_r} )

\displaystyle  f_{err} := {\Bbb E}(1_E | {\mathcal B}_{t} \times {\mathcal B}_{t} ) - {\Bbb E}(1_E | {\mathcal B}_{M_r} \times {\mathcal B}_{M_r} )

\displaystyle  f_U := 1_E - {\Bbb E}(1_E | {\mathcal B}_{t} \times {\mathcal B}_{t} ).

We now assert that the partition {V^{M_r}_1,\ldots,V^{M_r}_{2^{M_r}}} induced by {{\mathcal B}_{M_r}} obeys the conclusions of Lemma 2. For this, we observe various properties on the three components of {1_E}:

Lemma 7 ({f_{U^\perp}} locally constant) {f_{U^\perp}} is constant on each product set {V^{M_r}_i \times V^{M_r}_j}.

Proof: This is clear from construction. \Box

Lemma 8 ({f_{err}} small) We have {\| f_{err} \|_{L^2(V \times V)}^2 = O( \frac{1}{m\epsilon^2} )}.

Proof: This follows from (4) and Pythagoras’ theorem. \Box

Lemma 9 ({f_U} uniform) The expression

\displaystyle  \frac{1}{|V|^4} \sum_{v,w,v',w' \in V} f_U(v,w) f_U(v,w') f_U(v',w) f_U(v',w')

is of size {O( \frac{1}{\sqrt{m \epsilon F(M_r)}} )}.

Proof: This follows by repeating the proof of Lemma 6, but using (5) instead of (3). \Box

Now we verify the regularity.

First, we eliminate small atoms: the pairs {(V_i,V_j)} for which {|V^{M_r}_i| \leq \epsilon |V| / 2^{M_r}} clearly give a net contribution of at most {O( \epsilon |V|^2 )} and are acceptable; similarly for those pairs for which {|V^{M_r}_j| \leq \epsilon |V| / 2^{M_r}}. So we may henceforth assume that

\displaystyle  |V^{M_r}_i|, |V^{M_r}_j| \leq \epsilon |V| / 2^{M_r}. \ \ \ \ \ (6)

Now, let {A \subset V^{M_r}_i}, {B \subset V^{M_r}_i} have densities

\displaystyle  \alpha := |A|/|V^{M_r}_i| \ge \epsilon; \beta := |B|/|V^{M_r}_j| \geq \epsilon,


\displaystyle  \alpha \beta d(A,B) = \frac{1}{|V^{M_r}_i| |V^{M_r}_j|} \sum_{v \in V^{M_r}_i} \sum_{w \in V^{M_r}_i} 1_A(v) 1_B(w) 1_E(v,w).

We divide {1_E} into the three pieces {f_{U^\perp}}, {f_{err}}, {f_U}.

The contribution of {f_{U^\perp}} is exactly {\alpha \beta d(V^{M_r}_i, V^{M_r}_j)}.

The contribution of {f_{err}} can be bounded using Cauchy-Schwarz as

\displaystyle  O( \frac{1}{|V^{M_r}_i| |V^{M_r}_j|} \sum_{v \in V^{M_r}_i} \sum_{w \in V^{M_r}_i} |f_{err}(v,w)|^2)^{1/2}.

Using Lemma 8 and Chebyshev’s inequality, we see that the pairs {(V_i,V_j)} for which this quantity exceeds {\epsilon^3} will contribute at most {\epsilon^{-8}/m} to (1), which is acceptable if we choose {m} so that {m \gg \epsilon^{-9}}. Let us now discard these bad pairs.

Finally, the contribution of {f_U} can be bounded by two applications of Cauchy-Schwarz and (9) as

\displaystyle  O( \frac{|V|^2}{|V^{M_r}_i| |V^{M_r}_j|} \frac{1}{(m \epsilon F(M_r))^{1/8}} )

which by (6) is bounded by

\displaystyle  O( 2^{2M_r} \epsilon^{-2} / (m \epsilon F(M_r) )^{1/8} ).

This can be made {O(\epsilon^3)} by selecting {F} sufficiently rapidly growing depending on {\epsilon}. Putting this all together we see that

\displaystyle  \alpha \beta d(A,B) = \alpha \beta d(V^{M_r}_i, V^{M_r}_j) + O(\epsilon^3)

which (since {\alpha,\beta \geq \epsilon}) gives the desired regularity.

Remark 1 Of course, this argument gives tower-exponential bounds (as {F} is exponential and needs to be iterated {m} times), which will be familiar to any reader already acquainted with the regularity lemma.

Remark 2 One can take the partition induced by random neighbourhoods here and carve it up further to be both equitable and (mostly) regular, thus recovering a proof of Lemma 1, by following the arguments in this paper of mine. Of course, when one does so, one no longer has a partition created purely from random neighbourhoods, but it is pretty clear that one is not going to be able to make an equitable partition just from boolean operations applied to a few random neighbourhoods.