[This post is authored by Luca Trevisan. – T.]

Notions of “quasirandomness” for graphs and hypergraphs have many applications in combinatorics and computer science. Several past posts by Terry have addressed the role of quasirandom structures in additive combinatorics. A recurring theme is that if an object (a graph, a hypergraph, a subset of integers, …) is quasirandom, then several useful properties can be immediately deduced, but, also, if an object is not quasirandom then it possesses some “structure” than can be usefully exploited in an inductive argument. The contrapositive statements of such randomness-structure dichotomies are that certain simple properties imply quasirandomness. One can view such results as algorithms that can be used to “certify” quasirandomness, and the existence (or, in some cases, conjectural non-existence) of such algorithms has implications in computer science, as I shall discuss in this post.

Informally, an undirected graph $G=(V,E)$ is quasirandom if, for every two large sets of vertices $S$ and $T$, the number of edges between $S$ and $T$ is approximately what it would be, on average, if we had generated $G$ by picking $|E|$ edges uniformly at random. Quantitatively, we’ll say that a graph $G=(V,E)$ is $\epsilon$-quasirandom if for every two disjoint sets of vertices $S$ and $T$ the number of edges between $S$ and $T$ differs from $|E| \cdot |S| \cdot |T| / \binom{|V|}{2}$ by at most an $\epsilon |E|$ additive error term. This property is true with $\epsilon=0$ for the complete graph, and it is quite remarkable that there exist very sparse graphs that are quasi-random. Indeed it is possible to show that a random graph with $O(|V|/\epsilon^2)$ edges (that is, a graph where the average degree of a vertex is a constant depending only on $\epsilon$) is $\epsilon$-quasirandom with high probability. While this is remarkable enough, it is also true that there are efficient algorithms that, given a random graph with $c|V|$ edges, are able with high probability to verify that the graph is $O(1/\sqrt{c})$-quasirandom. (This is true with some caveats, as explained below.)

Here are some implications of such algorithms. Consider the problem of finding a largest independent set in a given graph. (If vertices represent tasks to be performed, and an edge represents a conflict that makes it impossible to perform two tasks at the same time, a largest independent set is a largest choice of simultaneously doable tasks.) This problem is NP-hard, meaning that if there were an efficient optimization algorithm for it, there would be an algorithm that given a provable mathematical statement would find a shortest proof for it in time polynomial in the length of the shortest proof; a number of other highly unlikely algorithmic consequences, including polynomial time algorithms to break any possible cryptographic protocol, would also follow. (See e.g. this description of the P versus NP problem.) In addition, we know that, in an $n$-vertex graph, even achieving an $n^{.99}$ approximation in polynomial time would imply that $P=NP$ (as proved by Hastad) and that, in particular, exact optimization would also be possible in polynomial time. When we look at graphs of maximum degree $d$, a $d+1$ approximation is trivial, but it is NP-hard to approximate the problem better than $d/2^{\sqrt{\log d}}$ (see this paper).

Now consider a random $d$-regular graph; it is easy to find an independent set of size $|V|/(d+1)$, and, with high probability, we can prove that the graph is $O(1/\sqrt d)$-quasirandom, and hence the largest independent set has size $O(|V|/\sqrt d)$. This means that we
achieve with high probability an approximation of $\sqrt d$, and so that
random graphs are not the hardest instances of the problem. Typically, in combinatorics, when random graphs are not extremal for a given problem, then finding extremal constructions is very hard; indeed, one has to work quite hard to come up with examples showing that certain approximate algorithms for the independent set problem (such as the Lovasz theta function) do achieve approximation as bad as $d^{1-o(1)}$ (see this paper by Feige).

The same phenomenon occurs for a number of other graph optimization problems. For example the Max Cut problem is the problem of finding a partition of the vertices of a given graph into two sets, so that a largest number of edges are “cut,” that is, have endpoints on different sides of the partition. There is a polynomial-time algorithm that achieves an $\approx .878$ approximation factor, it is NP-hard to approximate the problem better than $16/17$, and there is some evidence that no approximation better than $\approx .878$ is achievable (see Section 5.1 of this paper for references). On the other hand, in every graph it is easy to find a partition that cuts half of the edges (use a random partition), and in an $\epsilon$-quasirandom graph no partition can cut more than a $1/2 + O(\epsilon)$ fraction of edges; this means that, here, too, when looking at worst-case instances for certain algorithms one must resort to non-random constructions.

Having motivated the problem, let me briefly describe the algorithms.

First I’d like to discuss an algorithm that gives slightly worse bounds and works only on dense graphs, but is surprisingly simple: count the number of 4-cycles in the graph. This amazing result is due to Chung, Graham, Wilson and Thomason (see the introductory sections of this paper of Gowers for further references). Suppose we have a graph with $n$ vertices and $p \binom{n}{2}$ edges; then a first observation is that the number of 4-cycles (that is, the number of 4-tuples of vertices $(x_1,y_1,x_2,y_2)$ such that the four edges $(x_i,y_j)$ exist for $i,j \in \{1,2\}$) is at least $p^4n^4$, which follows from two applications of Cauchy-Schwarz:

$n^8 p^4 = ( \sum_{x,y} A(x,y) )^4$
$\leq (n \sum_{x,y_1,y_2} A(x,y_1)A(x,y_2))^2$
$\leq n^4 \sum_{x_1,x_2,y_1,y_2}A(x_1,y_1)A(x_1,y_2)A(x_2,y_1)A(x_2,y_2)$

where we define $A(x,y):=1$ if $(x,y)$ is an edge, and $A(x,y):= 0$ otherwise.

Note that $p^4 n^4$ is the average number of 4-cycles in a random graph with $n$ vertices in which each edge is included with probability $p$. The surprising thing is that if, in a given graph, the number of 4-cycles is close to $p^4 n^4$, then the graph is quasirandom. This follows from applying Cauchy-Schwarz twice, in a similar way, starting from the expression

$\frac {1}{n^4} (\sum_{x\in S,y\in T} (A(x,y) - p) )^4$

which we want to prove to be at most $\epsilon^4 p^4 n^4$ for every set $S$ and $T$, in order to establish that the graph is $\epsilon$-pseudorandom. It turns out that the above expression is upper bounded by the difference between the number of 4-cycles in the graph and $p^4 n^4$.

When we deal with sparse graphs, unfortunately, this approach breaks down. In fact, any approach that looks at small patterns is bound to fail: a random graph with $c|V|$ edges and a random bipartite graph with $c|V|$ edges look “locally the same” even though the former is very likely to be $O(1/\sqrt{c})$-quasirandom, while the latter is not quasirandom at all.

For sparse regular graphs (for which all vertices have the same degree), “spectral” techniques give extremely good certificates of quasirandomness. Let $G=(V,E)$ be a regular graph where all vertices have degree $d$, and $A$ be the adjacency matrix of $G$, hence $A(u,v)=1$ if $(u,v)$ is an edge and $A(u,v)=0$ otherwise. One can see that:

1. $A$ is symmetric, hence all eigenvalues are real;
2. the vector $(1,\ldots,1)$ is an eigenvector with eigenvalue $d$;
3. all eigenvalues are between -d and d. Let now $\lambda_1 \geq \lambda_2 \geq \ldots \geq \lambda_n$ be the eigenvalues of $A$, with multiplicities; as we said, $\lambda_1=d$.

It turns out that if all the other eigenvalues are at most $\epsilon d$ in absolute value, then $G$ is $\epsilon$-quasirandom. More precisely, if $S$ and $T$ are disjoint sets, then

$| e(S,T) - \frac{d\cdot |S|\cdot |T|}{|V|} | \leq \max \{ |\lambda_2|,|\lambda_n| \} \cdot \sqrt{|T|\cdot |V| }$

This result is known as the expander mixing lemma, and the intuition of the proof is quite simple. Let ${\bf v}_1, \ldots, {\bf v}_n$ be a system of orthonormal eigenvector for the eigenvalues $\lambda_1,\ldots,\lambda_n$. The number of edges between the sets $S$ and $T$ is

$1_S A 1_T^t$

where, for a set $S$, we define $1_S$ as the vector such that $1_S(u)=1$ if $u \in S$ and $1_S(u)=0$ otherwise. This can be rewritten as

$d\cdot 1_S \cdot {\mathbf v}_1^t \cdot 1_T \cdot {\mathbf v}_1^t + \sum_{i=2}^n \lambda_i 1_S {\mathbf v}_i^t \cdot 1_T \cdot {\mathbf v}_i^t$

and the first term is $d |S| |T| /n$, while the other terms can be upper
bounded by $\max_{i=2,\ldots,n} |\lambda_i| \sqrt{|S| |T|}$ in absolute value.Furthermore, if one picks a $d$-regular graph at random, then all eigenvalues except $\lambda_1$ are at most $O(\sqrt {d-1})$ with high probability. One can in fact prove that the upper bound is close to $2\sqrt{d-1}$, which is the best possible, but such arguments are very difficult. (A nitpick here is that said results apply to the probabilistic model where we take the union of $d$ random matchings rather than a random $d$-regular graph.)

To sum up, if a random graph is generated by taking the union of $d$ random matchings, we can certify that is is $O(1/\sqrt d)$-quasirandom. In the $G_{n,p}$ model where each edge is selected with probability $p$, similar arguments apply (and can certify $1/O(\sqrt d)$-quasirandomness, where $d = pn$ is the average degree), provided that the average degree $pn$ is at least $(\log n)^{O(1)}$. The case of average constant degree is trickier, but it can probably be handled by standard techniques (removing high-degree and isolated vertices before computing the spectrum) that have succeeded in related problems.

Given how satisfactory are the combinatorial and algorithmic theories of quasirandomness for graphs, we may hope for similar results for hypergraphs.

In the dense case, an approach based on counting local patterns indeed works in a completely analogous way. In a 3-uniform hypergraph (an hypergraph where each hyperedge contains three vertices) with $n$ vertices and $p \binom{n}{3}$ hyperedges we shall count the number of 6-tuples $(x_1,x_2,y_1,y_2,z_1,z_2)$ of vertices such that all the 8 hyperedges of the form $(x_i,y_j,z_k)$ exist for $i,j,k\in \{1,2\}$. Three applications of Cauchy-Schwarz show that this number is at least $p^8 n^6$, and that if this number is at most $(1+\epsilon^{O(1)} )p^8 n^6$ then the hypergraph is $\epsilon$-quasirandom.

Again, however, local approaches fail for sparse hypergraphs.

An algorithm for the sparse case would have remarkable applications for the kSAT problem. In the kSAT problem we are given a boolean AND-of-ORs expression where each OR term is over $k$ variables, for example this is an instance of 3SAT:

$(x_1 \vee x_3 \vee \neg x_4) \wedge (\neg x_1 \vee x_2 \vee \neg x_3) \wedge (x_1 \vee x_2 \vee \neg x_4)$

and we want to know if there is a setting to the variables that satisfies the formula. In the above example, setting all variables to $\hbox{True}$ works. This problem is a canonical NP-complete problem for all $k\geq 3$: if a polynomial time algorithm existed, then there would be polynomial time algorithms for all problems in NP. (There is a linear time algorithm to solve 2SAT.)

Recently, there has been considerable interest in the question of the average-case complexity of kSAT, in the model where we pick at random a formula with $n$ variables and $m(n)$ clauses. (A “clause” is an OR term in the formula.)

In the hard graph problems we discussed above, uniformly chosen random instances were “easier” than worst-case instances, assuming $P\neq NP$. In the kSAT case, the situation seems more complex. It is conjectured that, for every $k$, there is a constant $c_k$ such that when we generate a random formula with $m(n) \leq (c_k - \epsilon) n$ clauses then the formula is satisfiable with high probability, and when we generate a random formula with $m(n) \geq (c_k + \epsilon) n$ then the formula is unsatisfiable with high probability. Furthermore, there are algorithms that are conjectured to find satisfying assignments with high probability in the $m(n) \leq (c_k -\epsilon)$ regime. (See my posts here, here and here for references).

An open question is whether there is an efficient way to certify the unsatisfiability of random formulas in the $m(n) \geq (c_k + \epsilon) n$ regime.

One approach in this direction is to reduce to an hypergraph problem. Given a kSAT formula with $n$ variables and $m$ edges, construct a $k$-uniform hypergraph with 2n vertices (one for each variable x and bit b, which we think of as being the assignment $x=b$) and $m$ hyperdges, one for each clause, connecting the unique values of the variables that contradict the clause. Suppose the formula has a satisfying assignment, then there is a set of half the vertices such that no hyperedge is completely contained in the set; this is called an independent set. Now, if one generates the formula randomly, then the associated hypergraph is a random hypergraph. The question then becomes: given a random hypergraph, can we certify that there is no large independent set? If we could certify that a random hypergraph of sufficiently large average degree is, say, $1/100$-quasirandom, then the non-existence of large independent sets would follow, and we would be able to certify the unsatisfiability of comparably dense random kSAT instances.

A natural approach to certify the quasirandomness of a given sparse hypergraph is to look for generalizations of spectral methods. While there is no clear definition of what the “eigenvalues” of an 3-hypergraph (or, more generally, of a symmetric trilinear form) should be, Friedman and Wigderson consider the characterization of the second eigenvalue of a graph as the problem of optimizing a certain quadratic form. The problem has a natural generalization to trilinear forms, and yields a definition of “second eigenvalue” of a 3-hypergraph. They are able to prove that this parameter is bounded for random hypergraphs and that when it is bounded the hypergraph is quasirandom. Unfortunately, no algorithm is known to efficiently compute this parameter.

It is becoming a standard conjecture that, indeed, the unsatisfiability of random $kSAT$ instances is hard to verify when $m(n) = cn$ and $c$ is an arbitrarily large constant. (Note that the problem becomes easier, or no harder, when $c$ is larger.) On the other hand, simple refutation algorithms exist for kSAT in the setting of very high density, and the Chung-Graham-Wilson approach can certify quasirandomness of dense hypergraphs.

We can then ask what is the minimal density for which such problems admit efficient algorithmic solutions.

For kSAT, so far, the best refutation algorithms work when the density is about $n^{k/2}$, and, for even $k$, the idea of such algorithms also leads to an algorithm to verify the quasirandomness of random $k$-hypergraphs.

Translated to the hypergraph setting, the algorithm is as follows. Given a $k$-hypergraph (even $k$) with $n$ vertices and about $n^{k/2}$ hyperdges, we create a graph with $n^{k/2}$ vertices, one per $k/2$-tuple. Each $k$-hyperedge is associated to an edge in the graph (by splitting, say, randomly, the $k$-tuple into two $k/2$-tuples). One can see that if the graph is $\epsilon$-quasirandom, then the hypergraph is $O(\epsilon)$-quasirandom. (The contrapositive is easier to prove, suppose the hypergraph is not quasirandom, let $S_1,\ldots,S_k$ be the sets that witness that the quasirandomness property is contradicted; then the sets $(S_1 \times \cdots \times S_{k/2})$ and $(S_{k/2+1} \times \cdots \times S_k)$ contradict the quasirandomness of the graph.) If the hypergraph is randomly generated, then the graph is randomly generated, and we can use spectral techniques to verify its quasirandomness.

For odd $k$, there are algorithms that certify the unsatisfiability of random kSAT formulas with about $n^{k/2}$ clauses; as far as I know, the analogous result is open for hypergraphs. (The kSAT algorithms use properties of kSAT that have no analog in the general hypergraph setting, such as the fact that an assignment that satisfies both $(x\vee y\vee q)$ and $(\neg q \vee w\vee z)$ must also satisfy $(x \vee y \vee w \vee z)$.

Substantial work has gone into proving limitations of spectral methods for the problem of certifying the unsatisfiability of random instances of $k$SAT. Bounds at least as good as those obtained by spectral methods can always be obtained by formulating a semidefinite programming (SDP) relaxation of the given problem, and there are several results showing that even very powerful SDP relaxations of kSAT cannot certify unsatisfiability even for fairly large densities (see e.g. this paper), and the hypergraph quasirandomness problem is only harder. Still, it is plausible that one can verify the quasirandomness of a random hypergraph with at least $n^{1+\epsilon}$ hyperedges, for all $\epsilon>0$. At least, the problem should be solvable in 3-hypergraphs with $n^{1.5}$ hyperedges.