On Thursday, Avi Wigderson continued his Distinguished Lecture Series here at UCLA on computational complexity with his second lecture “Expander Graphs – Constructions and Applications“. As in the previous lecture, he spent some additional time after the talk on an “encore”, which in this case was how lossless expanders could be used to obtain rapidly decodable error-correcting codes.

The talk was largely based on these slides. Avi also has a recent monograph with Hoory and Linial on these topics. (For a brief introduction to expanders, I can also recommend Peter Sarnak’s Notices article. I also mention expanders to some extent in my third Milliman lecture.)

— What is an expander? —

Expander graph are sparse graphs with an additional special property. This property can be described in several equivalent ways (combinatorial, probabilistic, or algebraic):

1. Connectivity. Expander graphs are sparse but highly connected; any two disjoint sets of vertices cannot be disconnected from each other without removing a lot of edges (i.e. the sparsest cut between the sets is quite large). Equivalently, any proper subset A of vertices is going to have a large boundary $\partial A$, defined as the vertices which are not in A but are adjacent to a vertex in A.
2. Mixing. A random walk on an expander graph converges very rapidly to the uniform distribution.
3. Spectral gap. The second eigenvalue of the adjacency matrix is significantly smaller than the (typical) degree of the graph.

For sake of concreteness, Avi used the spectral gap formulation of expansion:

Definition. A ${}[n,d]$-graph is a graph G on n vertices which is d-regular, i.e. every vertex has d edges connected to it. A ${}[n,d,\alpha]$-graph is an ${}[n,d]$-graph which has the additional property $\| A(G) v \| \leq d \alpha \|v\|$

for all non-zero vectors $v \in {\Bbb R}^n$ whose coordinates sum to zero, where A(G) is the adjacency matrix of G and $\|v\|$ is the Euclidean norm of v.

A family of graphs $G_1, G_2, G_3, \ldots$ of d-regular graphs (thus each $G_i$ is an ${}[n_i,d]$-graph for some $n_i$) forms an expander family if each $G_i$ is a ${}[n_i,d,\alpha]$ graph for some $0 < \alpha < 1$.

The above definition was a bit technical, but it can be clarified a bit by considering an example of a graph which is definitely not an expander. Consider an ${}[n,d]$-graph G which is disconnected, with the n vertices divided into two disjoint sets A and B of n/2 vertices each, with no edge between A and B. Then if one lets v be the vector which is +1 on A and -1 on B (so that the total sum of coefficients is zero), one can check that $\|Av\| = d \|v\|$, so G is not an ${}[n,d,\alpha]$-graph for any $\alpha < 1$. More generally, it is an instructive exercise to show that an ${}[n,d]$-graph is always a ${}[n,d,1]$-graph, but is only an ${}[n,d,\alpha]$-graph for some $\alpha < 1$ if and only if it is connected. Thus the notion of an expander family can be viewed as a quantitative variant of connectivity.

[A technical point: strictly speaking, the adjective “expander” applies in a meaningful manner to families of graphs, rather than to individual graphs, much as the adjective “bounded” is meaningful when applied to sequences of numbers, but not to individual numbers. Applying the expander notion to a single graph is a bit silly since a single graph forms an expander family if and only if it is connected. Nevertheless, it is common when discussing expanders to refer to an “expander graph” in the singular, especially when considering a graph whose number of vertices n is quite large and depends implicitly on some other parameter (e.g. a large prime p). Amusingly, one could avoid any abuse of notation by resorting to non-standard analysis, redefining an expander graph to be a graph with a non-standard number of vertices n, but which is an ${}[n,d,\alpha]$-graph for some standard d and $\alpha < 1$, but this does not seem to significantly simplify or clarify the theory.]

Expanders have various important combinatorial and probabilistic properties. For instance, if G is an ${}[n,d,\alpha]$-graph, and the n vertices of G are partitioned into two sets A and B, then the number of edges e(A,B) connecting A to B can be easily seen to be bounded below by the inequality $e(A,B) \geq \frac{1-\alpha}{2} d \min( |A|, |B| )$.

Conversely, the discrete Cheeger inequality of Jerrum and Sinclair asserts that if a ${}[n,d]$-graph is such that $e(A,B) \geq \sqrt{2(1-\alpha)} d \min(|A|, |B| )$

for all partitions A, B, then the graph is an ${}[n,d,\alpha]$-graph. (Another closely related result is the expander mixing lemma.)

One can also show that a random walk on k steps in a ${}[n,d,\alpha]$-graph (starting at an arbitrary base point) converges exponentially fast in k to the uniform distribution (with constants depending on d and $\alpha$ but not n); informally, the random walk becomes very close to uniform after about $O(\log n)$ steps; a more precise statement can be found here. [You may have heard of the famous result of Bayer and Diaconis that a deck of cards can be perfectly shuffled after just seven riffle shuffles; to oversimplify a bit, that claim is equivalent to asserting an expander property of the Cayley graph whose vertices are deck configurations, with two vertices connected if one is a riffle shuffle of another.]

Expanders have many applications, most obviously in graph theory, but also in computer science, number theory, group theory, and even to such apparently unrelated topics as the Baum-Connes conjecture in K-theory, which also has connections to topology, though Avi focused primarily on the computer science applications in this talk. Just to give one simple example of such an application, he discussed deterministic amplification of probabilistic algorithms. Suppose one had an algorithm $A(i, r)$ which took some input i and some n-bit stream (or “coin flips”) $r \in \{0,1\}^n$ and returned an output A(i,r), which is a single bit (yes or no). Suppose that for a given input i, this algorithm was correct for (say) 2/3 of the possible bit streams r; thus if we select the n bits of r randomly the algorithm has at most a 1/3 probability of failure.

As discussed in the previous lecture, one can reduce the failure probability by iteration. If one selects k independent random bit streams $r_1,\ldots,r_k \in \{0,1\}^n$, and runs the algorithm A once for each such stream, and then takes the majority vote of the k different outcomes, then a simple application of the Chernoff bound shows that the resulting algorithm has a failure probability which is exponentially small ( $O( \exp(-ck) )$ for some absolute constant c > 0). But this iterated algorithm requires one to generate kn independent random bits. In some applications, high-quality random bits are an expensive resource, and so it is natural to ask whether one can obtain an algorithm with similar performance but requiring many fewer random bits. It turns out that one can, simply by selecting $r_1,\ldots,r_k$ not independently, but by selecting $r_1,r_2, r_3, \ldots, r_k$ in turn by a random walk along a bounded-degree expander graph on $\{0,1\}^n$ beginning at a random vertex, and then performing a majority vote. The mixing properties of random walk can be used to give the same type of exponentially small failure probability as before, but now only $n + O(k)$ random bits are needed (each step of the random walk requires O(1) random bits, to decide which edge to go down.)

In order for the above algorithm to work properly, one needs to construct expanders explicitly – so explicitly that given any vertex in the expander, one can quickly compute all the neighbours. This was Avi’s next topic.

— Constructions of expanders —

A simple counting argument of Pinsker shows that expanders exist in abundance; indeed, a random sequence of [n,d]-graphs (with $d \geq 3$ fixed and n going to infinity) is almost surely going to be an expander family. But for many applications, one needs more deterministic expanders.

The first explicit expander families arose from group theory (as given by Margulis, Gabber-Galil, Alon-Milman, Lubotsky-Phillips-Sarnak, and others); a typical example of such a family are the Cayley graphs whose vertices are the special linear group $SL_2({\Bbb F}_p)$ over a large field of prime order p, with the edges given by the set of generators $S := \{\begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix},\begin{pmatrix} 1 & 0 \\ 1 & 1 \end{pmatrix} \}$

(thus two matrices in $SL_2({\Bbb F}_p)$ are connected by a graph if their quotient is either in S, or is an inverse of an element in S.) The expansion of these algebraic graph families is deeply connected to some non-trivial results in group theory and analytic number theory; in this case, the key property is Selberg’s famous 3/16 theorem on the least eigenvalue of the Laplacian on modular curves.

These algebraic constructions can be made extremely tight in the sense that the spectral gap parameter $\alpha$ can be made optimally small. More precisely, there is the Alon-Boppana bound, which asserts that an ${}[n,d,\alpha]$-graph can only exist if $d\alpha \geq 2\sqrt{d-1}$. Graphs which attain this bound are known as Ramanujan graphs, and were first constructed by Lubotsky-Phillips-Sarnak and Margulis by algebraic methods. (Random graphs come quite close to being Ramanujan, but are off by a slight amount.)

More recently, starting with the work of Reingold, Vadhan, and Wigderson, more explicitly combinatorial constructions of expanders have been given. A fundamental building block in such constructions is the zigzag product $G \textcircled{z} H$ of a ${}[n,m]$-graph G and a ${}[m,d]$-graph H (thus H is small compared with G – the number of vertices of H equals the degree of G). The zigzag product is to graphs as the semi-direct product is to groups (and in fact, this connection can be made very precise, as the semi-direct product can be viewed as the specialisation of the zigzag product to Cayley graphs). Roughly speaking, the zigzag product is defined by “blowing up” each vertex of G into a copy of H, and then decoupling the edges in G to become disjoint. A bit more formally, if we let V and W be the vertex sets of G and H respectively, then we let $G \textcircled{z} H$ be the ${}[nm, d+1]$-graph with vertex sets V, and whose edge set consists of the “local” edges of form (a,x) and (a,y) where a lies in V and x,y form an edge in W, together with one “global” edge of the form (a,x), (b,y) for each edge (a,b) in G, where x and y depend on a and b and are chosen so that all the global edges are disjoint from each other (this can be done for instance by the greedy algorithm). [Note that there is a little bit of freedom in this construction, so that the zigzag product is not quite unique, but this is not a concern in applications, as all the zigzag products of two graphs have much the same expansion properties.]

The key fact about zigzag products is that they preserve expansion: the zigzag product $G_i \textcircled{z} H_i$ of two expander families $G_i$ and $H_i$ is another expander family. (With a slight variant of the zigzag product, one can give a simple and explicit relation between the various spectral parameters here.) On the other hand, zigzag products also keep the degree d of the graph quite small, even if the first factor G has large degree. Because of this, it is possible to iterate the zigzag product (particularly in conjunction with the exponentiation operation $G^r$, which replaces a graph G with a new graph (or multigraph) on the same vertex set whose adjacency matrix $A(G^r) = A(G)^r$ is the $r^{th}$ power of the original adjacency matrix A(G)) to create many new families of explicit expanders, with good expansion properties compared to their degree; I’ll omit the details here but they are covered in detail in Avi’s slides.

The zigzag construction is very flexible and robust, and can be tweaked in a large number of ways to create expander graphs with certain additional combinatorial properties – which are often not attainable with the earlier algebraic constructions, or even for the Ramanujan graphs (which are optimal with respect to the spectral parameter $\alpha$, but not necessarily with respect to more combinatorial parameters). For example, suppose one wants to find an [n,d] graph with the connectivity property that every pair of vertex sets of size at least n/a are connected by at least one edge, where a is a fixed large constant. This is easy to do for large d; the challenge is to get d as small as possible. For Ramanujan graphs, d needs to be as large as $a^2$ or so to obtain this property; in contrast, for random [n,d]-graphs, d can be as small as about $a \log a$. It is not known how to achieve this deterministically, but using zigzag products (not on expanders per se, but on a related object known as an extractor) one can get within a few logarithmic factors of this bound.

One can also combine the zigzag construction with algebraic constructions, for instance to create expander Cayley graphs in very general classes of finite groups, though Avi didn’t go into details on this point.

Another place where zigzag constructions have been useful is in lossless expanders. Observe that in an [n,d] graph, the neighbourhood of a k-vertex set can have size at most dk. A lossless expander is an [n,d]-graph (or more precisely, a expander family of such graphs) with the property that for any $\varepsilon > 0$ there exists an a such that every k-vertex set with $k \leq n/a$ has a neighbourhood of size at least $(1-\varepsilon) dk$; thus small sets expand nearly as much as possible. Ramanujan graphs don’t always have this property; no matter how large one makes a, one can find such graphs in which the neighbourhood of a k-element set is only about dk/2 or so. But by using the zigzag construction (this time to a type of graph known as a randomness conductor) one can create lossless expanders.

One particularly striking recent application of the zig-zag product construction (by Reingold) is to create a deterministic algorithm to solve the maze exploration problem mentioned in the previous lecture (getting from A to B in some large connected graph) in polynomial time and logarithmic space. Roughly speaking, the idea is to first observe that any connected graph G (let’s take it to be an [n,d]-graph for simplicity) has a very weak expansion property; it is an ${}[n,d,\alpha]$-graph for some $\alpha$ a little bit less than 1 (e.g. $\alpha = 1-1/n^3$ will do). This is not enough expansion by itself to do much of anything, but by taking graph powers and zigzag products of G with itself (and with a small expander), one can soon create a polynomially larger graph G’ which has a very good expansion constant, so much so that one can get from any point in G’ to any other by traversing a path of logarithmic length (and one can simply exhaust over all such paths in polynomial time and logarithmic memory). Finally, because the zigzag product is so combinatorially simple and explicit, it is possible to use the solution to the G’ exploration problem to obtain a solution to the G exploration problem with comparable time and memory costs.

— Lossless expanders and error-correcting codes —

Avi then talked about how lossless expanders can be used to produce quickly decodable error-correcting codes (and it is crucial here that the expander is lossless; graphs that only expand from k to dk/2 will not work for this application). Mathematically, a code for the purposes of this talk is a function $C: \{0,1\}^k \to \{0,1\}^n$ that makes k-bit strings into larger n-bit strings in an injective manner. Actually, for the purposes of this analysis, the actual function is not important; what is important is the list of codewords $C(\{0,1\}^k)$ (which is often identified with C in this subject, by abuse of notation). There are many important statistics of codes, but two particularly key ones are the rate k/n, and the distance, defined as the minimal separation (in the Hamming metric) between two codewords. Note that an error correcting code can (in principle) absorb a number of errors equal to half the distance, although in practice it may be computationally expensive to perform this correction.

Let us informally say that a code is good if the rate is comparable to 1, and if the distance is comparable to n. It was shown by Shannon that good codes exist, indeed a randomly selected code is likely to be rather good. On the other hand, it is not possible to perform error correction on a random code quickly.

One particularly popular type of code is a linear code, in which C is a linear function (identifying $\{0,1\} \equiv {\Bbb F}_2$ with the field of two elements). The image $C(\{0,1\}^k)$ of such a code is thus a linear subspace in $\{0,1\}^n \equiv {\Bbb F}_2^n$, and thus can be defined as the simultaneous null space of n-k linear functionals on this space, which can be viewed as checksums or parity bits that need to vanish in order for a given word to lie in the code. Over ${\Bbb F}_2$, a linear functional on the Hamming cube $\{0,1\}^n \equiv {\Bbb F}_2^n$ is nothing more than the sum of some collection of the coordinate functions on that cube. Thus one can describe the code C (or more precisely, its image $C(\{0,1\}^n)$) by a bipartite graph connecting the n coordinates with the n-k parity bits, with each parity bit being formed as the sum of the coordinates it is connected to.

With linear codes, encoding is a relatively quick process; the challenge is in decoding in the presence of errors. Remarkably, if one selects the graph defining the code to be a bipartite lossless expander (so that each of the n coordinates is connected to d parity bits for some bounded d, and any m coordinates (with m less than a small multiple of n) is connected to close to dm parity bits), then not only is the code good (high rate and high distance), one can decode very quickly (linear time in n) by the following belief propagation algorithm:

1. Start with the corrupted word, and compute all the parity bits. If they all vanish, then we have a codeword and we stop.
2. Otherwise, we go through each coordinate and see how many of the d parity bits connected to that coordinate are non-vanishing (i.e. they take the value 1). If a majority of these bits return 1, then we flip the bit of the coordinate; otherwise, we keep the bit unchanged. (Intuitively, we are reducing the number of non-vanishing parity bits as much as possible.)
3. Repeat 2 until all parity bits vanish.

Intuitively, the reason why this algorithm should work (if the number m of corrupted bits is small enough) is that in a lossless expander, the parity bits associated to each of the corrupted coordinates are mostly disjoint. Thus, every corrupted coordinates should see that most of the parity bits connected to it are taking the value of 1, giving a strong signal that those bits should change. Conversely, the uncorrupted bits should mostly see only a small minority of parity bits connected to it reporting the value 1. Indeed, one can show that each iteration of Step 2 cuts down the number m of corrupted bits by a constant factor (if m was small enough that the lossless property can be used, with some suitably small $\varepsilon$). Also it is not hard to analyse the run-time of this algorithm and show that it is linear in the length n of the code.

[Update, Jan 14: memory requirement for Reingold’s graph exploration algorithm corrected.  Connections to Baum-Connes conjecture reworded.]