Let ${F}$ be a field. A definable set over ${F}$ is a set of the form

$\displaystyle \{ x \in F^n | \phi(x) \hbox{ is true} \} \ \ \ \ \ (1)$

where ${n}$ is a natural number, and ${\phi(x)}$ is a predicate involving the ring operations ${+,\times}$ of ${F}$, the equality symbol ${=}$, an arbitrary number of constants and free variables in ${F}$, the quantifiers ${\forall, \exists}$, boolean operators such as ${\vee,\wedge,\neg}$, and parentheses and colons, where the quantifiers are always understood to be over the field ${F}$. Thus, for instance, the set of quadratic residues

$\displaystyle \{ x \in F | \exists y: x = y \times y \}$

is definable over ${F}$, and any algebraic variety over ${F}$ is also a definable set over ${F}$. Henceforth we will abbreviate “definable over ${F}$” simply as “definable”.

If ${F}$ is a finite field, then every subset of ${F^n}$ is definable, since finite sets are automatically definable. However, we can obtain a more interesting notion in this case by restricting the complexity of a definable set. We say that ${E \subset F^n}$ is a definable set of complexity at most ${M}$ if ${n \leq M}$, and ${E}$ can be written in the form (1) for some predicate ${\phi}$ of length at most ${M}$ (where all operators, quantifiers, relations, variables, constants, and punctuation symbols are considered to have unit length). Thus, for instance, a hypersurface in ${n}$ dimensions of degree ${d}$ would be a definable set of complexity ${O_{n,d}(1)}$. We will then be interested in the regime where the complexity remains bounded, but the field size (or field characteristic) becomes large.

In a recent paper, I established (in the large characteristic case) the following regularity lemma for dense definable graphs, which significantly strengthens the Szemerédi regularity lemma in this context, by eliminating “bad” pairs, giving a polynomially strong regularity, and also giving definability of the cells:

Lemma 1 (Algebraic regularity lemma) Let ${F}$ be a finite field, let ${V,W}$ be definable non-empty sets of complexity at most ${M}$, and let ${E \subset V \times W}$ also be definable with complexity at most ${M}$. Assume that the characteristic of ${F}$ is sufficiently large depending on ${M}$. Then we may partition ${V = V_1 \cup \ldots \cup V_m}$ and ${W = W_1 \cup \ldots \cup W_n}$ with ${m,n = O_M(1)}$, with the following properties:

• (Definability) Each of the ${V_1,\ldots,V_m,W_1,\ldots,W_n}$ are definable of complexity ${O_M(1)}$.
• (Size) We have ${|V_i| \gg_M |V|}$ and ${|W_j| \gg_M |W|}$ for all ${i=1,\ldots,m}$ and ${j=1,\ldots,n}$.
• (Regularity) We have

$\displaystyle |E \cap (A \times B)| = d_{ij} |A| |B| + O_M( |F|^{-1/4} |V| |W| ) \ \ \ \ \ (2)$

for all ${i=1,\ldots,m}$, ${j=1,\ldots,n}$, ${A \subset V_i}$, and ${B\subset W_j}$, where ${d_{ij}}$ is a rational number in ${[0,1]}$ with numerator and denominator ${O_M(1)}$.

My original proof of this lemma was quite complicated, based on an explicit calculation of the “square”

$\displaystyle \mu(w,w') := \{ v \in V: (v,w), (v,w') \in E \}$

of ${E}$ using the Lang-Weil bound and some facts about the étale fundamental group. It was the reliance on the latter which was the main reason why the result was restricted to the large characteristic setting. (I then applied this lemma to classify expanding polynomials over finite fields of large characteristic, but I will not discuss these applications here; see this previous blog post for more discussion.)

Recently, Anand Pillay and Sergei Starchenko (and independently, Udi Hrushovski) have observed that the theory of the étale fundamental group is not necessary in the argument, and the lemma can in fact be deduced from quite general model theoretic techniques, in particular using (a local version of) the concept of stability. One of the consequences of this new proof of the lemma is that the hypothesis of large characteristic can be omitted; the lemma is now known to be valid for arbitrary finite fields ${F}$ (although its content is trivial if the field is not sufficiently large depending on the complexity at most ${M}$).

Inspired by this, I decided to see if I could find yet another proof of the algebraic regularity lemma, again avoiding the theory of the étale fundamental group. It turns out that the spectral proof of the Szemerédi regularity lemma (discussed in this previous blog post) adapts very nicely to this setting. The key fact needed about definable sets over finite fields is that their cardinality takes on an essentially discrete set of values. More precisely, we have the following fundamental result of Chatzidakis, van den Dries, and Macintyre:

Proposition 2 Let ${F}$ be a finite field, and let ${M > 0}$.

• (Discretised cardinality) If ${E}$ is a non-empty definable set of complexity at most ${M}$, then one has

$\displaystyle |E| = c |F|^d + O_M( |F|^{d-1/2} ) \ \ \ \ \ (3)$

where ${d = O_M(1)}$ is a natural number, and ${c}$ is a positive rational number with numerator and denominator ${O_M(1)}$. In particular, we have ${|F|^d \ll_M |E| \ll_M |F|^d}$.

• (Definable cardinality) Assume ${|F|}$ is sufficiently large depending on ${M}$. If ${V, W}$, and ${E \subset V \times W}$ are definable sets of complexity at most ${M}$, so that ${E_w := \{ v \in V: (v,w) \in W \}}$ can be viewed as a definable subset of ${V}$ that is definably parameterised by ${w \in W}$, then for each natural number ${d = O_M(1)}$ and each positive rational ${c}$ with numerator and denominator ${O_M(1)}$, the set

$\displaystyle \{ w \in W: |E_w| = c |F|^d + O_M( |F|^{d-1/2} ) \} \ \ \ \ \ (4)$

is definable with complexity ${O_M(1)}$, where the implied constants in the asymptotic notation used to define (4) are the same as those that appearing in (3). (Informally: the “dimension” ${d}$ and “measure” ${c}$ of ${E_w}$ depends definably on ${w}$.)

We will take this proposition as a black box; a proof can be obtained by combining the description of definable sets over pseudofinite fields (discussed in this previous post) with the Lang-Weil bound (discussed in this previous post). (The former fact is phrased using nonstandard analysis, but one can use standard compactness-and-contradiction arguments to convert such statements to statements in standard analysis, as discussed in this post.)

The above proposition places severe restrictions on the cardinality of definable sets; for instance, it shows that one cannot have a definable set of complexity at most ${M}$ and cardinality ${|F|^{1/2}}$, if ${|F|}$ is sufficiently large depending on ${M}$. If ${E \subset V}$ are definable sets of complexity at most ${M}$, it shows that ${|E| = (c+ O_M(|F|^{-1/2})) |V|}$ for some rational ${0\leq c \leq 1}$ with numerator and denominator ${O_M(1)}$; furthermore, if ${c=0}$, we may improve this bound to ${|E| = O_M( |F|^{-1} |V|)}$. In particular, we obtain the following “self-improving” properties:

• If ${E \subset V}$ are definable of complexity at most ${M}$ and ${|E| \leq \epsilon |V|}$ for some ${\epsilon>0}$, then (if ${\epsilon}$ is sufficiently small depending on ${M}$ and ${F}$ is sufficiently large depending on ${M}$) this forces ${|E| = O_M( |F|^{-1} |V| )}$.
• If ${E \subset V}$ are definable of complexity at most ${M}$ and ${||E| - c |V|| \leq \epsilon |V|}$ for some ${\epsilon>0}$ and positive rational ${c}$, then (if ${\epsilon}$ is sufficiently small depending on ${M,c}$ and ${F}$ is sufficiently large depending on ${M,c}$) this forces ${|E| = c |V| + O_M( |F|^{-1/2} |V| )}$.

It turns out that these self-improving properties can be applied to the coefficients of various matrices (basically powers of the adjacency matrix associated to ${E}$) that arise in the spectral proof of the regularity lemma to significantly improve the bounds in that lemma; we describe how this is done below the fold. We also make some connections to the stability-based proofs of Pillay-Starchenko and Hrushovski.

— 1. The regularity lemma —

We begin with proving a slightly weaker version of the lemma, in which the bound ${O_M(|F|^{-1/4})}$ is weakened to ${O_M(|F|^{-1/12})}$; we will later use the self-improving properties mentioned previously to upgrade this bound. In other words, we begin by showing

Lemma 3 (Algebraic regularity lemma, weak version) Let ${F}$ be a finite field, let ${V,W}$ be definable non-empty sets of complexity at most ${M}$, and let ${E \subset V \times W}$ also be definable with complexity at most ${M}$. Then we may partition ${V = V_1 \cup \ldots V_m}$ and ${W = W_1 \cup \ldots \cup W_n}$ with ${m,n = O_M(1)}$, with the following properties:

• (Definability) Each of the ${V_1,\ldots,V_m,W_1,\ldots,W_n}$ are definable of complexity ${O_M(1)}$.
• (Size) We have ${|V_i| \gg_M |V|}$ and ${|W_j| \gg_M |W|}$ for all ${i=1,\ldots,m}$ and ${j=1,\ldots,n}$.
• (Regularity) We have

$\displaystyle |E \cap (A \times B)| = d_{ij} |A| |B| + O_M( |F|^{-1/12} |V| |W| )$

for all ${i=1,\ldots,m}$, ${j=1,\ldots,n}$, ${A \subset V_i}$, and ${B\subset W_j}$, where ${d_{ij}}$ is a rational number in ${[0,1]}$ with numerator and denominator ${O_M(1)}$.

We now prove this lemma, by combining spectral theory with an (iterated) ${TT^*}$ method. We may assume that ${|F|}$ is sufficiently large depending on ${M}$, as the claim is trivial otherwise (just take ${m=n=1}$). We consider the probability spaces ${(V,m_V)}$, ${(W, m_W)}$ with the uniform probability measures ${m_V, m_W}$ on ${V,W}$; we abbreviate ${L^p(V,m_V)}$ as ${L^p(V)}$ for ${1 \leq p \leq \infty}$, and similarly define ${L^p(W)}$ (we work exclusively with real-valued functions here). The set ${E}$ then defines a linear operator ${T: L^2(V) \rightarrow L^2(W)}$ by the formula

$\displaystyle T f(w) := \int_V 1_E(v,w) f(v)\ dm_V(v);$

thus ${T}$ is the operator with integral kernel ${1_E}$. We can then compute the adjoint operator ${T^*: L^2(W) \rightarrow L^2(V)}$ as

$\displaystyle T^* g(v) = \int_W 1_E(v,w) g(w)\ dm_W(w).$

From the Cauchy-Schwarz inequality we observe the bounds

$\displaystyle \|Tf \|_{L^2(W)} \leq \|Tf\|_{L^\infty(W)} \leq \|f\|_{L^2(V)} \ \ \ \ \ (5)$

for all ${f \in L^2(V)}$, and similarly

$\displaystyle \|T^*g \|_{L^2(V)} \leq \|T^* g\|_{L^\infty(V)} \leq \|g\|_{L^2(W)} \ \ \ \ \ (6)$

for all ${g \in L^2(W)}$.

Now we apply the singular value decomposition (which is easily justified as ${L^2(V)}$, ${L^2(W)}$ are finite dimensional) to write

$\displaystyle T f = \sum_i \sigma_i \langle f, u_i \rangle_{L^2(V)} y_i$

and

$\displaystyle T^* g = \sum_i \sigma_i\langle g, y_i \rangle_{L^2(W)} u_i$

for some finite sequence ${\sigma_1 \geq \sigma_2 \geq \ldots > 0}$ of singular values, the ${u_i}$ are an orthonormal system in ${L^2(V)}$, and the ${y_i}$ are an orthonormal system in ${L^2(W)}$. Observe that operator ${TT^*: L^2(W) \rightarrow L^2(W)}$ can be diagonalised as

$\displaystyle TT^* g = \sum_i \sigma_i^2 \langle g,y_i \rangle_{L^2(W)} y_i$

and hence

$\displaystyle \frac{1}{|W|} \hbox{tr}(TT^*) = \sum_i \sigma_i^2.$

On the other hand, direct calculation shows that

$\displaystyle \frac{1}{|W|} \hbox{tr}(TT^*) = \frac{|E|}{|V||W|} \leq 1$

and so we have the Frobenius norm (or Hilbert-Schmidt norm) bound

$\displaystyle \sum_i \sigma_i^2 \leq 1. \ \ \ \ \ (7)$

Next, from the identities

$\displaystyle y_i = \frac{1}{\sigma_i} T u_i; \quad u_i = \frac{1}{\sigma_i} T^* y_i$

and (5), (6) we obtain sup-norm bounds on the eigenfunctions:

$\displaystyle \| y_i \|_{L^\infty(W)}, \|u_i\|_{L^\infty(V)} \leq \frac{1}{\sigma_i}. \ \ \ \ \ (8)$

We now use these bounds to obtain a low rank approximation to the sixth power ${TT^* TT^* TT^*: L^2(W) \rightarrow L^2(W)}$. This operator may be diagonalised as

$\displaystyle TT^* TT^* TT^* g = \sum_i \sigma_i^6 \langle g,y_i \rangle_{L^2(W)} y_i. \ \ \ \ \ (9)$

We let ${\epsilon>0}$ be a small quantity (depending only on ${M}$) to be chosen later, and split

$\displaystyle TT^* TT^* TT^* = A + B$

where ${A}$ is a low rank operator

$\displaystyle Ag := \sum_{i: \sigma_i \geq \epsilon} \sigma_i^6 \langle g,y_i \rangle_{L^2(W)} y_i$

and ${B}$ is the error term

$\displaystyle Bg := \sum_{i: \sigma_i < \epsilon} \sigma_i^6 \langle g,y_i \rangle_{L^2(W)} y_i.$

Observe from the triangle inequality, Hölder’s inequality and (8), (7) that

$\displaystyle \|Bg \|_{L^\infty(W)} \leq \sum_{i: \sigma_i < \epsilon} \sigma_i^6 |\langle g,y_i \rangle_{L^2(W)}| \frac{1}{\sigma_i}$

$\displaystyle \leq \sum_{i: \sigma_i < \epsilon} \sigma_i^6 \frac{1}{\sigma_i} \|g\|_{L^1(W)} \frac{1}{\sigma_i}$

$\displaystyle \leq \epsilon^2 \sum_i \sigma_i^2 \|g\|_{L^1(W)}$

$\displaystyle \leq \epsilon^2 \|g\|_{L^1(W)}$

for any ${g \in L^1(W)}$. In other words, the integral kernel of ${B}$ is bounded pointwise by ${\epsilon^2}$. As for ${A}$, we use (8) to discretise ${y_i = y'_i + e_i}$, where ${y'_i}$ takes on at most ${\epsilon^{-100}}$ values (say), and ${e_i}$ is bounded in magnitude by ${O(\epsilon^{99})}$. We can then split ${A = A' + E}$, where

$\displaystyle A' g := \sum_{i: \sigma_i \geq \epsilon} \sigma_i^6 \langle g,y'_i \rangle_{L^2(W)} y'_i$

and the error term ${E}$ can easily be shown (by arguments similar to before) to have a bound of the form

$\displaystyle \|E g \|_{L^\infty(W)} \ll \epsilon^2 \|g\|_{L^1(W)}$

(with many powers of ${\epsilon}$ to spare). We have thus written ${TT^*TT^*TT^* = A' + E'}$, where ${E'}$ has integral kernel bounded pointwise by ${O(\epsilon^2)}$.

Now from (7), the number of summands ${i}$ in the definition of ${A}$ is at most ${1/\epsilon^2}$. If we then partition ${W = W_1 \cup \ldots \cup W_n}$ using the level sets of the corresponding ${y'_i}$ (removing any empty cells to ensure the ${W_i}$ are non-empty), then we have

$\displaystyle n \leq (\epsilon^{-100})^{1/\epsilon^2},$

and the integral kernel of ${A'}$ is constant on each ${W_i \times W_j}$. Thus, the integral kernel of ${TT^*TT^*TT^*}$ fluctuates by at most ${O(\epsilon^2)}$ on each ${W_i \times W_j}$.

Thus far, we have not used the definability of ${E}$. Now we will do so in order to take advantage of self-improving properties. The integral kernel ${K_3(w, w')}$ of ${TT^*TT^*TT^*}$ can be evaluated explicitly as

$\displaystyle K_3(w,w') = \frac{|G_{w,w'}|}{|V|^3 |W|^2}$

where ${G_{w,w'} \subset V \times W \times V \times W \times V}$ is the set

$\displaystyle G_{w,w'} :=\{ (v_1, w_2, v_2, w_3, v_3) \in V \times W \times V \times W \times V:$

$\displaystyle (v_1,w), (v_1,w_2), (v_2,w_2), (v_2,w_3), (v_3,w_3), (v_3,w') \in E \}.$

Clearly, ${G_{w,w'}}$ is a definable set of complexity ${O_M(1)}$. From Proposition 2 we thus have the discretisation bounds

$\displaystyle K_3(w,w') = c_{w,w'} + O_M( |F|^{-1/2} )$

where ${c_{w,w'}}$ is a non-negative rational number with numerator and denominator ${O_M(1)}$. The set of such rationals is finite (basically a portion of the Farey sequence), and thus are separated from each other by some separation ${\gg_M 1}$. If we combine this fact with the local near-constancy of ${K_A}$ on ${W_i \times W_j}$, we can thus (if ${\epsilon}$ is small enough) self-improve this near-constancy to the bound

$\displaystyle K_3(w,w') = c_{i,j} + O_M( |F|^{-1/2} ) \ \ \ \ \ (10)$

for all ${1 \leq i,j \leq n}$, all ${w \in W_i}$, ${w' \in W_j}$, and some non-negative rational ${c_{i,j}}$ depending only on ${i,j}$ with numerator and denominator ${O_M(1)}$.

Note that as ${K_3}$ is symmetric, ${c_{i,j}}$ will be symmetric too. If two ${i,i' \in \{1,\ldots,n\}}$ are indistinguishable in the sense that ${c_{i,j} = c_{i',j}}$ for all ${1 \leq j \leq n}$, then we may concatenate ${W_i}$ and ${W_{i'}}$ (reducing ${n}$ by one), so we may assume that elements of ${\{1,\ldots,n\}}$ are pairwise distinguishable. From Proposition 2 we see that for each ${1 \leq j \leq n}$ and rational ${c}$, the set ${\bigcup_{i: c_{i,j} = c} W_i}$ is definable with complexity ${O_M(1)}$. Taking boolean combinations and using distinguishability, we conclude that each ${W_i}$ is also definable with complexity ${O_M(1)}$. In particular, we either have ${|W_i| \gg_M |W|}$ or ${|W_i| \ll_M |F|^{-1} |W|}$.

Now let ${g \in L^2(W)}$ be supported on ${W_j}$ with mean zero for some ${1 \leq j \leq m}$. From (10) we see that

$\displaystyle |\langle TT^* TT^* TT^* g, g \rangle_{L^2(W)}|$

$\displaystyle = |\int_W \int_W K_3(w,w') g(w) g(w')\ dm_W(w) dm_W(w')| \ll_M |F|^{-1/2}\|g\|_{L^2(W)}^2.$

On the other hand, from (9) we have

$\displaystyle |\langle TT^* TT^* TT^* g, g \rangle_{L^2(W)}| = \sum_i \sigma_i^6 |\langle g, y_i \rangle_{L^2(W)}|^2$

while from Bessel’s inequality we have

$\displaystyle \sum_i |\langle g, y_i \rangle_{L^2(W)}|^2 \leq \|g\|_{L^2(W)}^2$

and thus by Hölder’s inequality

$\displaystyle \sum_i \sigma_i^2 |\langle g, y_i \rangle_{L^2(W)}|^2 \ll_M |F|^{-1/6} \|g\|_{L^2(W)}^2.$

The left-hand side is just ${\|T^* g\|_{L^2(V)}^2}$, so we conclude that

$\displaystyle \|T^* g \|_{L^2(V)} \ll_M |F|^{-1/12} \|g\|_{L^2(W)}$

whenever ${g}$ is supported on one of the ${W_j}$ with mean zero.

In a similar vein, we may find a partition ${V = V_1 \cup \ldots \cup V_m}$ into definable sets of complexity ${O_M(1)}$ with

$\displaystyle m \leq (\epsilon^{-100})^{1/\epsilon^2},$

with the property that

$\displaystyle \|T f \|_{L^2(W)} \ll_M |F|^{-1/12} \|f\|_{L^2(V)}$

whenever ${f}$ is supported on one of the ${V_i}$ with mean zero.

Let ${1 \leq i \leq n}$ and ${1 \leq j \leq m}$ with ${|V_i| \gg_M |V|}$ and ${|W_j| \gg_M |W|}$. From the above estimates and Cauchy-Schwarz, we see that

$\displaystyle |\langle Tf, g \rangle_{L^2(W)}| \ll_M |F|^{-1/12} \|f\|_{L^2(V)} \|g\|_{L^2(W)}$

whenever ${f, g}$ are supported on ${V_i, W_j}$ respectively, with at least one of ${f,g}$ having mean zero. If ${A \subset V_i}$, ${B \subset W_j}$, then by decomposing ${1_A = (1_A - \frac{|A|}{|V_i|} 1_{V_i}) + \frac{|A|}{|V_i|} 1_{V_i}}$ and ${1_B = (1_B - \frac{|B|}{|W_j|} 1_{W_j}) + \frac{|B|}{|W_j|} 1_{W_j}}$ into mean-zero and constant components on ${V_i,W_j}$ respectively, we conclude that

$\displaystyle \langle T 1_A, 1_B \rangle_{L^2(W)} = \frac{|A|}{|V_i|} \frac{|B|}{|W_j|} \langle T 1_{V_i}, 1_{W_j} \rangle_{L^2(W)} + O_M( |F|^{-1/12} ).$

On the other hand, we have

$\displaystyle \langle T 1_{V_i}, 1_{W_j} \rangle_{L^2(W)} = \frac{|E \cap (V_i \times W_j)|}{|V| |W|}$

so by Proposition 2, ${\langle T 1_{V_i}, 1_{W_j} \rangle_{L^2(W)} = c_{ij} + O_M(|F|^{-1/2})}$ for some rational ${c_{ij}}$ with numerator and denominator ${O_M(1)}$. Also ${|V_i|/|V| = a_i + O_M(|F|^{-1/2})}$ and ${|W_j|/|W| = b_j + O_M(|F|^{-1/2})}$ for some further rational ${a_i,b_j}$ with numerator and denominator ${O_M(1)}$. This proves Lemma 3, except that we have some “small” ${V_i}$ and ${W_j}$ with ${|V_i| \ll_M |F|^{-1} |V|}$ and ${|W_j| \ll_M |F|^{-1} |W|}$ that we have not yet dealt with. However we may simply concatenate these small cells onto one of the larger cells, as this does not significantly affect the definability or regularity properties of the large cells, and we are done.

Now we improve the ${O_M(|F|^{-1/12})}$ bound to ${O_M(|F|^{-1/4})}$. Let ${V_i, W_j}$ be large cells as in the preceding argument. Let ${T_{ij}: L^2(V_i) \rightarrow L^2(W_j)}$ be the restriction of ${T}$ to ${V_i}$ and ${W_j}$ (associated to the set ${E \cap (V_i \times W_j)}$), giving ${V_i}$, ${W_j}$ uniform probability measure, then we have

$\displaystyle |\langle T_{ij} f, g \rangle_{L^2(W_j)}| \ll_M |F|^{-1/12} \|f\|_{L^2(V_i)} \|g\|_{L^2(W_j)}$

whenever at least one of ${f\in L^2(V_i)}$, ${g \in L^2(W_j)}$ have mean zero. Thus, we may split ${T_{ij} = d_{ij} J + E_{ij}}$ where ${d_{ij}}$ is a constant, ${J: L^2(V_i) \rightarrow L^2(W_j)}$ is the rank one operator

$\displaystyle J f =\langle f, 1_{V_i} \rangle_{L^2(V_i)} 1_{W_j}$

and ${E_{ij}}$ has operator norm ${O_M(|F|^{-1/12})}$. By computing ${\langle T_{ij} 1_{V_i}, 1_{W_j} \rangle_{L^2(W_j})}$ as before, we see that we may take ${d_{ij}}$ to be a rational with numerator and denominator ${O_M(1)}$.

Recall that ${T_{ij}}$ has a Hilbert-Schmidt norm of ${O(1)}$, and so by the triangle inequality ${E_{ij}}$ does also. If we square, we conclude that ${T_{ij} T_{ij}^* =d_{ij}^2 JJ^* + E'_{ij}}$, where ${E'_{ij}:L^2(V_i) \rightarrow L^2(V_i)}$ has Hilbert-Schmidt norm ${O_M( |F|^{-1/12} )}$. Thus, if ${K_2: W_j \times W_j \rightarrow {\bf R}}$ is the integral kernel of ${T_{ij} T_{ij}^*}$, then

$\displaystyle \int_{W_j}\int_{W_j} |K_2(w,w') - d_{ij}^2|^2\ dm_{W_j}(w) dm_{W_j}(w') \ll_M |F|^{-1/6}. \ \ \ \ \ (11)$

On the other hand, we have

$\displaystyle K_2(w,w') = \frac{1}{|V_i|} |\{ v\in V_i: (v,w), (v,w') \in E \}|$

so by Proposition 2, we have either

$\displaystyle K_2(w,w') = d_{ij}^2+ O(|F|^{-1/2})$

or

$\displaystyle |K_2(w,w') - d_{ij}^2| \gg_M 1$

for each ${w,w'\in W_j}$. Furthermore, the set of pairs ${(w,w')}$ for which the latter occurs is a definable subset of ${W_j \times W_j}$ with complexity ${O_M(1)}$. On the other hand, from (11) and Chebyshev’s inequality, this set has cardinality ${O_M( |F|^{-1/6} |W_j|^2 )}$. By Proposition 2, we may self-improve this to ${O_M( |F|^{-1} |W_j|^2 )}$. Integrating, we self-improve (11) to

$\displaystyle \int_{W_j}\int_{W_j} |K_2(w,w') - d_{ij}^2|^2\ dm_{W_j}(w) dm_{W_j}(w') \ll_M |F|^{-1},$

so by Cauchy-Schwarz we have

$\displaystyle |\int_{W_j} \int_{W_j} K_2(w,w') g(w) g(w')\ dm_{W_j}(w) dm_{W_j}(w')| \ll_M |F|^{-1/2} \|g\|_{L^2(W_j)}^2$

whenever ${g}$ has mean zero, thus

$\displaystyle \|T_{ij}^* g\|_{L^2(V_j)} \ll_M |F|^{-1/4} \|g\|_{L^2(W_j)}$

whenever ${g}$ has mean zero. By repeating the final steps of the proof of Lemma 3 we conclude (2) as required.

— 2. Stability —

The following stability result was established by Hrushovski (see Proposition 2.25 of this paper)

Proposition 4 (Stability) Let ${0 \leq p < q \leq 1}$ be real numbers, and suppose that there are measurable subsets ${E_1,\ldots,E_n,F_1,\ldots,F_n}$ of a probability space ${(X,\mu)}$ with the property that

$\displaystyle \mu( E_i \cap F_j ) \geq q$

and

$\displaystyle \mu( E_j \cap F_i ) \leq p$

for ${1 \leq i < j \leq n}$. Then we have ${n = O_{p,q}(1)}$.

In the language of stability theory, this proposition asserts that the relation ${\mu(E_i \cap F_j) = \alpha}$ is stable. This proposition was the key input used by Pillay-Starchenko and by Hrushovski (together with the discretisation results of Proposition 2) to give an alternate proof of the algebraic regularity lemma.

Let us first sketch Hrushovski’s proof of this lemma. If the claim failed, then by a compactness argument one can construct infinite sequences ${(E_i)_{i=1}^\infty, (F_i)_{i=1}^\infty}$ of measurable subsets of a probability space ${(X,\mu)}$ with the property that

$\displaystyle \mu( E_i \cap F_j ) \geq q \ \ \ \ \ (12)$

and

$\displaystyle \mu( E_j \cap F_i ) \leq p \ \ \ \ \ (13)$

for all ${i < j}$. Next, by Ramsey theory (for directed graphs), for any given ${\epsilon>0}$, we may pass to an infinite subsequence such that

$\displaystyle \mu( E_{i_1} \cap F_{i_2} ) = \mu( E_{j_1} \cap F_{j_2} ) + O(\epsilon)$

for any ${i_1 < i_2}$ and ${j_1< j_2}$, and by further compactness we may eliminate the ${\epsilon}$ error, so that

$\displaystyle \mu( E_{i_1} \cap F_{i_2} ) = \mu( E_{j_1} \cap F_{j_2} )$

for any ${i_1< i_2}$ and ${j_1 < j_2}$. More generally, (using Ramsey’s theorem for directed hypergraphs and more compactness) we may assume the order-indiscernibility property

$\displaystyle \mu( G^1_{i_1} \cap \ldots \cap G^k_{i_k} ) = \mu( G^1_{j_1} \cap \ldots \cap G^k_{j_k} )$

for any ${i_1 < \ldots < i_k}$ and ${j_1 < \ldots < j_k}$, and any ${G^1,\ldots,G^k \in \{E,F\}}$. This implies (by a variant of de Finetti’s theorem, proven in Hrushovski’s paper via the Krein-Milman theorem) that ${\mu}$ is exchangeable, which among other things implies that

$\displaystyle \mu( E_i \cap F_j ) = \mu(E_j \cap F_i)$

for any ${i,j}$; but this contradicts (12), (13).

Now we give a spectral proof of the above proposition. Let ${A = (a_{ij})_{1\leq i,j \leq n}}$ be a matrix to be chosen later, and consider the expression

$\displaystyle |\sum_{1 \leq i,j \leq n} a_{ij} \mu( E_i \cap F_j)|.$

We can rewrite this expression as

$\displaystyle |\int_X \sum_{1 \leq i,j \leq n} a_{ij} 1_{E_i}(x) 1_{F_j}(x)\ d\mu(x)|.$

For any ${x}$, the vectors ${(1_{E_i}(x))_{i=1}^n}$ and ${(1_{F_j}(x))_{i=1}^n}$ have an ${\ell^2}$ norm of at most ${\sqrt{n}}$. Thus, we can bound the preceding expression by

$\displaystyle \int_X \| A \|_{op} \sqrt{n} \sqrt{n} \ d\mu = n \|A\|_{op}$

where ${\|A\|_{op}}$ is the operator norm of ${A}$; thus we have

$\displaystyle |\sum_{1 \leq i,j \leq n} a_{ij} \mu( E_i \cap F_j)| \leq n \|A\|_{op}.$

We apply this inequality to the discrete Hilbert transform matrix, in which ${a_{ij} = \frac{1}{i-j}}$ for ${i \neq j}$ and ${a_{ii} = 0}$. This matrix has an operator norm of at most ${\pi}$ (this is Hilbert’s inequality). Thus

$\displaystyle |\sum_{1 \leq i < j \leq n} \frac{1}{i-j} (\mu( E_i \cap F_j)-\mu( E_j \cap F_i))| \leq n \pi.$

But by (12), (13), the left-hand side is at least ${(q-p) \sum_{1 \leq i < j \leq n} \frac{1}{i-j} \gg (q-p) n \log n}$, and the claim follows (with a bound of the form ${n \ll \exp( O( \frac{1}{q-p} ) )}$).