You are currently browsing the monthly archive for December 2012.

Mathematicians study a variety of different mathematical structures, but perhaps the structures that are most commonly associated with mathematics are the number systems, such as the integers ${{\bf Z}}$ or the real numbers ${{\bf R}}$. Indeed, the use of number systems is so closely identified with the practice of mathematics that one sometimes forgets that it is possible to do mathematics without explicit reference to any concept of number. For instance, the ancient Greeks were able to prove many theorems in Euclidean geometry, well before the development of Cartesian coordinates and analytic geometry in the seventeenth century, or the formal constructions or axiomatisations of the real number system that emerged in the nineteenth century (not to mention precursor concepts such as zero or negative numbers, whose very existence was highly controversial, if entertained at all, to the ancient Greeks). To do this, the Greeks used geometric operations as substitutes for the arithmetic operations that would be more familiar to modern mathematicians. For instance, concatenation of line segments or planar regions serves as a substitute for addition; the operation of forming a rectangle out of two line segments would serve as a substitute for multiplication; the concept of similarity can be used as a substitute for ratios or division; and so forth.

A similar situation exists in modern physics. Physical quantities such as length, mass, momentum, charge, and so forth are routinely measured and manipulated using the real number system ${{\bf R}}$ (or related systems, such as ${{\bf R}^3}$ if one wishes to measure a vector-valued physical quantity such as velocity). Much as analytic geometry allows one to use the laws of algebra and trigonometry to calculate and prove theorems in geometry, the identification of physical quantities with numbers allows one to express physical laws and relationships (such as Einstein’s famous mass-energy equivalence ${E=mc^2}$) as algebraic (or differential) equations, which can then be solved and otherwise manipulated through the extensive mathematical toolbox that has been developed over the centuries to deal with such equations.

However, as any student of physics is aware, most physical quantities are not represented purely by one or more numbers, but instead by a combination of a number and some sort of unit. For instance, it would be a category error to assert that the length of some object was a number such as ${10}$; instead, one has to say something like “the length of this object is ${10}$ yards”, combining both a number ${10}$ and a unit (in this case, the yard). Changing the unit leads to a change in the numerical value assigned to this physical quantity, even though no physical change to the object being measured has occurred. For instance, if one decides to use feet as the unit of length instead of yards, then the length of the object is now ${30}$ feet; if one instead uses metres, the length is now ${9.144}$ metres; and so forth. But nothing physical has changed when performing this change of units, and these lengths are considered all equal to each other:

$\displaystyle 10 \hbox{ yards } = 30 \hbox{ feet } = 9.144 \hbox{ metres}.$

It is then common to declare that while physical quantities and units are not, strictly speaking, numbers, they should be manipulated using the laws of algebra as if they were numerical quantities. For instance, if an object travels ${10}$ metres in ${5}$ seconds, then its speed should be

$\displaystyle (10 m) / (5 s) = 2 ms^{-1}$

where we use the usual abbreviations of ${m}$ and ${s}$ for metres and seconds respectively. Similarly, if the speed of light ${c}$ is ${c=299 792 458 ms^{-1}}$ and an object has mass ${10 kg}$, then Einstein’s mass-energy equivalence ${E=mc^2}$ then tells us that the energy-content of this object is

$\displaystyle (10 kg) (299 792 458 ms^{-1})^2 \approx 8.99 \times 10^{17} kg m^2 s^{-2}.$

Note that the symbols ${kg, m, s}$ are being manipulated algebraically as if they were mathematical variables such as ${x}$ and ${y}$. By collecting all these units together, we see that every physical quantity gets assigned a unit of a certain dimension: for instance, we see here that the energy ${E}$ of an object can be given the unit of ${kg m^2 s^{-2}}$ (more commonly known as a Joule), which has the dimension of ${M L^2 T^{-2}}$ where ${M, L, T}$ are the dimensions of mass, length, and time respectively.

There is however one important limitation to the ability to manipulate “dimensionful” quantities as if they were numbers: one is not supposed to add, subtract, or compare two physical quantities if they have different dimensions, although it is acceptable to multiply or divide two such quantities. For instance, if ${m}$ is a mass (having the units ${M}$) and ${v}$ is a speed (having the units ${LT^{-1}}$), then it is physically “legitimate” to form an expression such as ${\frac{1}{2} mv^2}$, but not an expression such as ${m+v}$ or ${m-v}$; in a similar spirit, statements such as ${m=v}$ or ${m\geq v}$ are physically meaningless. This combines well with the mathematical distinction between vector, scalar, and matrix quantities, which among other things prohibits one from adding together two such quantities if their vector or matrix type are different (e.g. one cannot add a scalar to a vector, or a vector to a matrix), and also places limitations on when two such quantities can be multiplied together. A related limitation, which is not always made explicit in physics texts, is that transcendental mathematical functions such as ${\sin}$ or ${\exp}$ should only be applied to arguments that are dimensionless; thus, for instance, if ${v}$ is a speed, then ${\hbox{arctanh}(v)}$ is not physically meaningful, but ${\hbox{arctanh}(v/c)}$ is (this particular quantity is known as the rapidity associated to this speed).

These limitations may seem like a weakness in the mathematical modeling of physical quantities; one may think that one could get a more “powerful” mathematical framework if one were allowed to perform dimensionally inconsistent operations, such as add together a mass and a velocity, add together a vector and a scalar, exponentiate a length, etc. Certainly there is some precedent for this in mathematics; for instance, the formalism of Clifford algebras does in fact allow one to (among other things) add vectors with scalars, and in differential geometry it is quite common to formally apply transcendental functions (such as the exponential function) to a differential form (for instance, the Liouville measure ${\frac{1}{n!} \omega^n}$ of a symplectic manifold can be usefully thought of as a component of the exponential ${\exp(\omega)}$ of the symplectic form ${\omega}$).

However, there are several reasons why it is advantageous to retain the limitation to only perform dimensionally consistent operations. One is that of error correction: one can often catch (and correct for) errors in one’s calculations by discovering a dimensional inconsistency, and tracing it back to the first step where it occurs. Also, by performing dimensional analysis, one can often identify the form of a physical law before one has fully derived it. For instance, if one postulates the existence of a mass-energy relationship involving only the mass of an object ${m}$, the energy content ${E}$, and the speed of light ${c}$, dimensional analysis is already sufficient to deduce that the relationship must be of the form ${E = \alpha mc^2}$ for some dimensionless absolute constant ${\alpha}$; the only remaining task is then to work out the constant of proportionality ${\alpha}$, which requires physical arguments beyond that provided by dimensional analysis. (This is a simple instance of a more general application of dimensional analysis known as the Buckingham ${\pi}$ theorem.)

The use of units and dimensional analysis has certainly been proven to be very effective tools in physics. But one can pose the question of whether it has a properly grounded mathematical foundation, in order to settle any lingering unease about using such tools in physics, and also in order to rigorously develop such tools for purely mathematical purposes (such as analysing identities and inequalities in such fields of mathematics as harmonic analysis or partial differential equations).

The example of Euclidean geometry mentioned previously offers one possible approach to formalising the use of dimensions. For instance, one could model the length of a line segment not by a number, but rather by the equivalence class of all line segments congruent to the original line segment (cf. the Frege-Russell definition of a number). Similarly, the area of a planar region can be modeled not by a number, but by the equivalence class of all regions that are equidecomposable with the original region (one can, if one wishes, restrict attention here to measurable sets in order to avoid Banach-Tarski-type paradoxes, though that particular paradox actually only arises in three and higher dimensions). As mentioned before, it is then geometrically natural to multiply two lengths to form an area, by taking a rectangle whose line segments have the stated lengths, and using the area of that rectangle as a product. This geometric picture works well for units such as length and volume that have a spatial geometric interpretation, but it is less clear how to apply it for more general units. For instance, it does not seem geometrically natural (or, for that matter, conceptually helpful) to envision the equation ${E=mc^2}$ as the assertion that the energy ${E}$ is the volume of a rectangular box whose height is the mass ${m}$ and whose length and width is given by the speed of light ${c}$.

But there are at least two other ways to formalise dimensionful quantities in mathematics, which I will discuss below the fold. The first is a “parametric” model in which dimensionful objects are modeled as numbers (or vectors, matrices, etc.) depending on some base dimensional parameters (such as units of length, mass, and time, or perhaps a coordinate system for space or spacetime), and transforming according to some representation of a structure group that encodes the range of these parameters; this type of “coordinate-heavy” model is often used (either implicitly or explicitly) by physicists in order to efficiently perform calculations, particularly when manipulating vector or tensor-valued quantities. The second is an “abstract” model in which dimensionful objects now live in an abstract mathematical space (e.g. an abstract vector space), in which only a subset of the operations available to general-purpose number systems such as ${{\bf R}}$ or ${{\bf R}^3}$ are available, namely those operations which are “dimensionally consistent” or invariant (or more precisely, equivariant) with respect to the action of the underlying structure group. This sort of “coordinate-free” approach tends to be the one which is preferred by pure mathematicians, particularly in the various branches of modern geometry, in part because it can lead to greater conceptual clarity, as well as results of great generality; it is also close to the more informal practice of treating mathematical manipulations that do not preserve dimensional consistency as being physically meaningless.

Things are pretty quiet here during the holiday season, but one small thing I have been working on recently is a set of notes on special relativity that I will be working through in a few weeks with some bright high school students here at our local math circle.  I have only two hours to spend with this group, and it is unlikely that we will reach the end of the notes (in which I derive the famous mass-energy equivalence relation E=mc^2, largely following Einstein’s original derivation as discussed in this previous blog post); instead we will probably spend a fair chunk of time on related topics which do not actually require special relativity per se, such as spacetime diagrams, the Doppler shift effect, and an analysis of my airport puzzle.  This will be my first time doing something of this sort (in which I will be spending as much time interacting directly with the students as I would lecturing);  I’m not sure exactly how it will play out, being a little outside of my usual comfort zone of undergraduate and graduate teaching, but am looking forward to finding out how it goes.   (In particular, it may end up that the discussion deviates somewhat from my prepared notes.)

The material covered in my notes is certainly not new, but I ultimately decided that it was worth putting up here in case some readers here had any corrections or other feedback to contribute (which, as always, would be greatly appreciated).

[Dec 24 and then Jan 21: notes updated, in response to comments.]

I’ve just uploaded to the arXiv my paper “Mixing for progressions in non-abelian groups“, submitted to Forum of Mathematics, Sigma (which, along with sister publication Forum of Mathematics, Pi, has just opened up its online submission system). This paper is loosely related in subject topic to my two previous papers on polynomial expansion and on recurrence in quasirandom groups (with Vitaly Bergelson), although the methods here are rather different from those in those two papers. The starting motivation for this paper was a question posed in this foundational paper of Tim Gowers on quasirandom groups. In that paper, Gowers showed (among other things) that if ${G}$ was a quasirandom group, patterns such as ${(x,xg,xh, xgh)}$ were mixing in the sense that, for any four sets ${A,B,C,D \subset G}$, the number of such quadruples ${(x,xg,xh, xgh)}$ in ${A \times B \times C \times D}$ was equal to ${(\mu(A) \mu(B) \mu(C) \mu(D) + o(1)) |G|^3}$, where ${\mu(A) := |A| / |G|}$, and ${o(1)}$ denotes a quantity that goes to zero as the quasirandomness of the group goes to infinity. In my recent paper with Vitaly, we also considered mixing properties of some other patterns, namely ${(x,xg,gx)}$ and ${(g,x,xg,gx)}$. This paper is concerned instead with the pattern ${(x,xg,xg^2)}$, that is to say a geometric progression of length three. As observed by Gowers, by applying (a suitably quantitative version of) Roth’s theorem in (cosets of) a cyclic group, one can obtain a recurrence theorem for this pattern without much effort: if ${G}$ is an arbitrary finite group, and ${A}$ is a subset of ${G}$ with ${\mu(A) \geq \delta}$, then there are at least ${c(\delta) |G|^2}$ pairs ${(x,g) \in G}$ such that ${x, xg, xg^2 \in A}$, where ${c(\delta)>0}$ is a quantity depending only on ${\delta}$. However, this argument does not settle the question of whether there is a stronger mixing property, in that the number of pairs ${(x,g) \in G^2}$ such that ${(x,xg,xg^2) \in A \times B \times C}$ should be ${(\mu(A)\mu(B)\mu(C)+o(1)) |G|^2}$ for any ${A,B,C \subset G}$. Informally, this would assert that for ${x, g}$ chosen uniformly at random from ${G}$, the triplet ${(x, xg, xg^2)}$ should resemble a uniformly selected element of ${G^3}$ in some weak sense.

For non-quasirandom groups, such mixing properties can certainly fail. For instance, if ${G}$ is the cyclic group ${G = ({\bf Z}/N{\bf Z},+)}$ (which is abelian and thus highly non-quasirandom) with the additive group operation, and ${A = \{1,\ldots,\lfloor \delta N\rfloor\}}$ for some small but fixed ${\delta > 0}$, then ${\mu(A) = \delta + o(1)}$ in the limit ${N \rightarrow \infty}$, but the number of pairs ${(x,g) \in G^2}$ with ${x, x+g, x+2g \in A}$ is ${(\delta^2/2 + o(1)) |G|^2}$ rather than ${(\delta^3+o(1)) |G|^2}$. The problem here is that the identity ${(x+2g) = 2(x+g) - x}$ ensures that if ${x}$ and ${x+g}$ both lie in ${A}$, then ${x+2g}$ has a highly elevated likelihood of also falling in ${A}$. One can view ${A}$ as the preimage of a small ball under the one-dimensional representation ${\rho: G \rightarrow U(1)}$ defined by ${\rho(n) := e^{2\pi i n/N}}$; similar obstructions to mixing can also be constructed from other low-dimensional representations.

However, by definition, quasirandom groups do not have low-dimensional representations, and Gowers asked whether mixing for ${(x,xg,xg^2)}$ could hold for quasirandom groups. I do not know if this is the case for arbitrary quasirandom groups, but I was able to settle the question for a specific class of quasirandom groups, namely the special linear groups ${G := SL_d(F)}$ over a finite field ${F}$ in the regime where the dimension ${d}$ is bounded (but is at least two) and ${F}$ is large. Indeed, for such groups I can obtain a count of ${(\mu(A) \mu(B) \mu(C) + O( |F|^{-\min(d-1,2)/8} )) |G|^2}$ for the number of pairs ${(x,g) \in G^2}$ with ${(x, xg, xg^2) \in A \times B \times C}$. In fact, I have the somewhat stronger statement that there are ${(\mu(A) \mu(B) \mu(C) \mu(D) + O( |F|^{-\min(d-1,2)/8} )) |G|^2}$ pairs ${(x,g) \in G^2}$ with ${(x,xg,xg^2,g) \in A \times B \times C \times D}$ for any ${A,B,C,D \subset G}$.

I was also able to obtain a partial result for the length four progression ${(x,xg,xg^2, xg^3)}$ in the simpler two-dimensional case ${G = SL_2(F)}$, but I had to make the unusual restriction that the group element ${g \in G}$ was hyperbolic in the sense that it was diagonalisable over the finite field ${F}$ (as opposed to diagonalisable over the algebraic closure ${\overline{F}}$ of that field); this amounts to the discriminant of the matrix being a quadratic residue, and this holds for approximately half of the elements of ${G}$. The result is then that for any ${A,B,C,D \subset G}$, one has ${(\frac{1}{2} \mu(A) \mu(B) \mu(C) \mu(D) + o(1)) |G|^2}$ pairs ${(x,g) \in G}$ with ${g}$ hyperbolic and ${(x,xg,xg^2,xg^3) \subset A \times B \times C \times D}$. (Again, I actually show a slightly stronger statement in which ${g}$ is restricted to an arbitrary subset ${E}$ of hyperbolic elements.)

For the length three argument, the main tools used are the Cauchy-Schwarz inequality, the quasirandomness of ${G}$, and some algebraic geometry to ensure that a certain family of probability measures on ${G}$ that are defined algebraically are approximately uniformly distributed. The length four argument is significantly more difficult and relies on a rather ad hoc argument involving, among other things, expander properties related to the work of Bourgain and Gamburd, and also a “twisted” version of an argument of Gowers that is used (among other things) to establish an inverse theorem for the ${U^3}$ norm.

I give some details of these arguments below the fold.

Perhaps the most important structural result about general large dense graphs is the Szemerédi regularity lemma. Here is a standard formulation of that lemma:

Lemma 1 (Szemerédi regularity lemma) Let ${G = (V,E)}$ be a graph on ${n}$ vertices, and let ${\epsilon > 0}$. Then there exists a partition ${V = V_1 \cup \ldots \cup V_M}$ for some ${M \leq M(\epsilon)}$ with the property that for all but at most ${\epsilon M^2}$ of the pairs ${1 \leq i \leq j \leq M}$, the pair ${V_i, V_j}$ is ${\epsilon}$-regular in the sense that

$\displaystyle | d( A, B ) - d( V_i, V_j ) | \leq \epsilon$

whenever ${A \subset V_i, B \subset V_j}$ are such that ${|A| \geq \epsilon |V_i|}$ and ${|B| \geq \epsilon |V_j|}$, and ${d(A,B) := |\{ (a,b) \in A \times B: \{a,b\} \in E \}|/|A| |B|}$ is the edge density between ${A}$ and ${B}$. Furthermore, the partition is equitable in the sense that ${||V_i| - |V_j|| \leq 1}$ for all ${1 \leq i \leq j \leq M}$.

There are many proofs of this lemma, which is actually not that difficult to establish; see for instance these previous blog posts for some examples. In this post I would like to record one further proof, based on the spectral decomposition of the adjacency matrix of ${G}$, which is essentially due to Frieze and Kannan. (Strictly speaking, Frieze and Kannan used a variant of this argument to establish a weaker form of the regularity lemma, but it is not difficult to modify the Frieze-Kannan argument to obtain the usual form of the regularity lemma instead. Some closely related spectral regularity lemmas were also developed by Szegedy.) I found recently (while speaking at the Abel conference in honour of this year’s laureate, Endre Szemerédi) that this particular argument is not as widely known among graph theory experts as I had thought, so I thought I would record it here.

For reasons of exposition, it is convenient to first establish a slightly weaker form of the lemma, in which one drops the hypothesis of equitability (but then has to weight the cells ${V_i}$ by their magnitude when counting bad pairs):

Lemma 2 (Szemerédi regularity lemma, weakened variant) . Let ${G = (V,E)}$ be a graph on ${n}$ vertices, and let ${\epsilon > 0}$. Then there exists a partition ${V = V_1 \cup \ldots \cup V_M}$ for some ${M \leq M(\epsilon)}$ with the property that for all pairs ${(i,j) \in \{1,\ldots,M\}^2}$ outside of an exceptional set ${\Sigma}$, one has

$\displaystyle | E(A,B) - d_{ij} |A| |B| | \ll \epsilon |V_i| |V_j| \ \ \ \ \ (1)$

whenever ${A \subset V_i, B \subset V_j}$, for some real number ${d_{ij}}$, where ${E(A,B) := |\{ (a,b) \in A \times B: \{a,b\} \in E \}|}$ is the number of edges between ${A}$ and ${B}$. Furthermore, we have

$\displaystyle \sum_{(i,j) \in \Sigma} |V_i| |V_j| \ll \epsilon |V|^2. \ \ \ \ \ (2)$

Let us now prove Lemma 2. We enumerate ${V}$ (after relabeling) as ${V = \{1,\ldots,n\}}$. The adjacency matrix ${T}$ of the graph ${G}$ is then a self-adjoint ${n \times n}$ matrix, and thus admits an eigenvalue decomposition

$\displaystyle T = \sum_{i=1}^n \lambda_i u_i^* u_i$

for some orthonormal basis ${u_1,\ldots,u_n}$ of ${{\bf C}^n}$ and some eigenvalues ${\lambda_1,\ldots,\lambda_n \in {\bf R}}$, which we arrange in decreasing order of magnitude:

$\displaystyle |\lambda_1| \geq \ldots \geq |\lambda_n|.$

We can compute the trace of ${T^2}$ as

$\displaystyle \hbox{tr}(T^2) = \sum_{i=1}^n |\lambda_i|^2.$

But we also have ${\hbox{tr}(T^2) = 2|E| \leq n^2}$, so

$\displaystyle \sum_{i=1}^n |\lambda_i|^2 \leq n^2. \ \ \ \ \ (3)$

Among other things, this implies that

$\displaystyle |\lambda_i| \leq \frac{n}{\sqrt{i}} \ \ \ \ \ (4)$

for all ${i \geq 1}$.

Let ${F: {\bf N} \rightarrow {\bf N}}$ be a function (depending on ${\epsilon}$) to be chosen later, with ${F(i) \geq i}$ for all ${i}$. Applying (3) and the pigeonhole principle (or the finite convergence principle, see this blog post), we can find ${J \leq C(F,\epsilon)}$ such that

$\displaystyle \sum_{J \leq i < F(J)} |\lambda_i|^2 \leq \epsilon^3 n^2.$

(Indeed, the bound on ${J}$ is basically ${F}$ iterated ${1/\epsilon^3}$ times.) We can now split

$\displaystyle T = T_1 + T_2 + T_3, \ \ \ \ \ (5)$

where ${T_1}$ is the “structured” component

$\displaystyle T_1 := \sum_{i < J} \lambda_i u_i^* u_i, \ \ \ \ \ (6)$

${T_2}$ is the “small” component

$\displaystyle T_2 := \sum_{J \leq i < F(J)} \lambda_i u_i^* u_i, \ \ \ \ \ (7)$

and ${T_3}$ is the “pseudorandom” component

$\displaystyle T_3 := \sum_{i > F(J)} \lambda_i u_i^* u_i. \ \ \ \ \ (8)$

We now design a vertex partition to make ${T_1}$ approximately constant on most cells. For each ${i < J}$, we partition ${V}$ into ${O_{J,\epsilon}(1)}$ cells on which ${u_i}$ (viewed as a function from ${V}$ to ${{\bf C}}$) only fluctuates by ${O(\epsilon n^{-1/2} /J)}$, plus an exceptional cell of size ${O( \frac{\epsilon}{J} |V|)}$ coming from the values where ${|u_i|}$ is excessively large (larger than ${\sqrt{\frac{J}{\epsilon}} n^{-1/2}}$). Combining all these partitions together, we can write ${V = V_1 \cup \ldots \cup V_{M-1} \cup V_M}$ for some ${M = O_{J,\epsilon}(1)}$, where ${V_M}$ has cardinality at most ${\epsilon |V|}$, and for all ${1 \leq i \leq M-1}$, the eigenfunctions ${u_1,\ldots,u_{J-1}}$ all fluctuate by at most ${O(\epsilon/J)}$. In particular, if ${1 \leq i,j \leq M-1}$, then (by (4) and (6)) the entries of ${T_1}$ fluctuate by at most ${O(\epsilon)}$ on each block ${V_i \times V_j}$. If we let ${d_{ij}}$ be the mean value of these entries on ${V_i \times V_j}$, we thus have

$\displaystyle 1_B^* T_1 1_A = d_{ij} |A| |B| + O( \epsilon |V_i| |V_j| ) \ \ \ \ \ (9)$

for any ${1 \leq i,j \leq M-1}$ and ${A \subset V_i, B \subset V_j}$, where we view the indicator functions ${1_A, 1_B}$ as column vectors of dimension ${n}$.

Next, we observe from (3) and (7) that ${\hbox{tr} T_2^2 \leq \epsilon^3 n^2}$. If we let ${x_{ab}}$ be the coefficients of ${T_2}$, we thus have

$\displaystyle \sum_{a,b \in V} |x_{ab}|^2 \leq \epsilon^3 n^2$

and hence by Markov’s inequality we have

$\displaystyle \sum_{a \in V_i} \sum_{b \in V_j} |x_{ab}|^2 \leq \epsilon^2 |V_i| |V_j| \ \ \ \ \ (10)$

for all pairs ${(i,j) \in \{1,\ldots,M-1\}^2}$ outside of an exceptional set ${\Sigma_1}$ with

$\displaystyle \sum_{(i,j) \in \Sigma_1} |V_i| |V_j| \leq \epsilon |V|^2.$

If ${(i,j) \in \{1,\ldots,M-1\}^2}$ avoids ${\Sigma_1}$, we thus have

$\displaystyle 1_B^* T_2 1_A = O( \epsilon |V_i| |V_j| ) \ \ \ \ \ (11)$

for any ${A \subset V_i, B \subset V_j}$, by (10) and the Cauchy-Schwarz inequality.

Finally, to control ${T_3}$ we see from (4) and (8) that ${T_3}$ has an operator norm of at most ${n/\sqrt{F(J)}}$. In particular, we have from the Cauchy-Schwarz inequality that

$\displaystyle 1_B^* T_3 1_A = O( n^2 / \sqrt{F(J)} ) \ \ \ \ \ (12)$

for any ${A, B \subset V}$.

Let ${\Sigma}$ be the set of all pairs ${(i,j) \in \{1,\ldots,M\}^2}$ where either ${(i,j) \in \Sigma_1}$, ${i = M}$, ${j=M}$, or

$\displaystyle \min(|V_i|, |V_j|) \leq \frac{\epsilon}{M} n.$

One easily verifies that (2) holds. If ${(i,j) \in \{1,\ldots,M\}^2}$ is not in ${\Sigma}$, then by summing (9), (11), (12) and using (5), we see that

$\displaystyle 1_B^* T 1_A = d_{ij} |A| |B| + O( \epsilon |V_i| |V_j| ) + O( n^2 / \sqrt{F(J)} ) \ \ \ \ \ (13)$

for all ${A \subset V_i, B \subset V_j}$. The left-hand side is just ${E(A,B)}$. As ${(i,j) \not \in \Sigma}$, we have

$\displaystyle |V_i|, |V_j| > \frac{\epsilon}{M} n$

and so (since ${M = O_{J,\epsilon}(1)}$)

$\displaystyle n^2 / \sqrt{F(J)} \ll_{J,\epsilon} |V_i| |V_j| / \sqrt{F(J)}.$

If we let ${F}$ be a sufficiently rapidly growing function of ${J}$ that depends on ${\epsilon}$, the second error term in (13) can be absorbed in the first, and (1) follows. This concludes the proof of Lemma 2.

To prove Lemma 1, one argues similarly (after modifying ${\epsilon}$ as necessary), except that the initial partition ${V_1,\ldots,V_M}$ of ${V}$ constructed above needs to be subdivided further into equitable components (of size ${\epsilon |V|/M+O(1)}$), plus some remainder sets which can be aggregated into an exceptional component of size ${O( \epsilon |V| )}$ (and which can then be redistributed amongst the other components to arrive at a truly equitable partition). We omit the details.

Remark 1 It is easy to verify that ${F}$ needs to be growing exponentially in ${J}$ in order for the above argument to work, which leads to tower-exponential bounds in the number of cells ${M}$ in the partition. It was shown by Gowers that a tower-exponential bound is actually necessary here. By varying ${F}$, one basically obtains the strong regularity lemma first established by Alon, Fischer, Krivelevich, and Szegedy; in the opposite direction, setting ${F(J) := J}$ essentially gives the weak regularity lemma of Frieze and Kannan.

Remark 2 If we specialise to a Cayley graph, in which ${V = (V,+)}$ is a finite abelian group and ${E = \{ (a,b): a-b \in A \}}$ for some (symmetric) subset ${A}$ of ${V}$, then the eigenvectors are characters, and one essentially recovers the arithmetic regularity lemma of Green, in which the vertex partition classes ${V_i}$ are given by Bohr sets (and one can then place additional regularity properties on these Bohr sets with some additional arguments). The components ${T_1, T_2, T_3}$ of ${T}$, representing high, medium, and low eigenvalues of ${T}$, then become a decomposition associated to high, medium, and low Fourier coefficients of ${A}$.

Remark 3 The use of spectral theory here is parallel to the use of Fourier analysis to establish results such as Roth’s theorem on arithmetic progressions of length three. In analogy with this, one could view hypergraph regularity as being a sort of “higher order spectral theory”, although this spectral perspective is not as convenient as it is in the graph case.