Earlier this month, Hao Huang (who, incidentally, was a graduate student here at UCLA) gave a remarkably short proof of a long-standing problem in theoretical computer science known as the sensitivity conjecture. See for instance this blog post of Gil Kalai for further discussion and links to many other online discussions of this result. One formulation of the theorem proved is as follows. Define the -dimensional hypercube graph
to be the graph with vertex set
, and with every vertex
joined to the
vertices
, where
is the standard basis of
.
Theorem 1 (Lower bound on maximum degree of induced subgraphs of hypercube) Let
be a set of at least
vertices in
. Then there is a vertex in
that is adjacent (in
) to at least
other vertices in
.
The bound (or more precisely,
) is completely sharp, as shown by Chung, Furedi, Graham, and Seymour; we describe this example below the fold. When combined with earlier reductions of Gotsman-Linial and Nisan-Szegedy; we give these below the fold also.
Let be the adjacency matrix of
(where we index the rows and columns directly by the vertices in
, rather than selecting some enumeration
), thus
when
for some
, and
otherwise. The above theorem then asserts that if
is a set of at least
vertices, then the
minor
of
has a row (or column) that contains at least
non-zero entries.
The key step to prove this theorem is the construction of rather curious variant of the adjacency matrix
:
Proposition 2 There exists a
matrix
which is entrywise dominated by
in the sense that
and such that
has
as an eigenvalue with multiplicity
.
Assuming this proposition, the proof of Theorem 1 can now be quickly concluded. If we view as a linear operator on the
-dimensional space
of functions of
, then by hypothesis this space has a
-dimensional subspace
on which
acts by multiplication by
. If
is a set of at least
vertices in
, then the space
of functions on
has codimension at most
in
, and hence intersects
non-trivially. Thus the
minor
of
also has
as an eigenvalue (this can also be derived from the Cauchy interlacing inequalities), and in particular this minor has operator norm at least
. By Schur’s test, this implies that one of the rows or columns of this matrix has absolute values summing to at least
, giving the claim.
Remark 3 The argument actually gives a strengthening of Theorem 1: there exists a vertex
of
with the property that for every natural number
, there are at least
paths of length
in the restriction
of
to
that start from
. Indeed, if we let
be an eigenfunction of
on
, and let
be a vertex in
that maximises the value of
, then for any
we have that the
component of
is equal to
; on the other hand, by the triangle inequality, this component is at most
times the number of length
paths in
starting from
, giving the claim.
This argument can be viewed as an instance of a more general “interlacing method” to try to control the behaviour of a graph on all large subsets
by first generating a matrix
on
with very good spectral properties, which are then partially inherited by the
minor of
by interlacing inequalities. In previous literature using this method (see e.g., this survey of Haemers, or this paper of Wilson), either the original adjacency matrix
, or some non-negatively weighted version of that matrix, was used as the controlling matrix
; the novelty here is the use of signed controlling matrices. It will be interesting to see what further variants and applications of this method emerge in the near future. (Thanks to Anurag Bishoi in the comments for these references.)
The “magic” step in the above argument is constructing . In Huang’s paper,
is constructed recursively in the dimension
in a rather simple but mysterious fashion. Very recently, Roman Karasev gave an interpretation of this matrix in terms of the exterior algebra on
. In this post I would like to give an alternate interpretation in terms of the operation of twisted convolution, which originated in the theory of the Heisenberg group in quantum mechanics.
Firstly note that the original adjacency matrix , when viewed as a linear operator on
, is a convolution operator
where
is the counting measure on the standard basis , and
denotes the ordinary convolution operation
As is well known, this operation is commutative and associative. Thus for instance the square of the adjacency operator
is also a convolution operator
where the convolution kernel is moderately complicated:
The factor in this expansion comes from combining the two terms
and
, which both evaluate to
.
More generally, given any bilinear form , one can define the twisted convolution
of two functions . This operation is no longer commutative (unless
is symmetric). However, it remains associative; indeed, one can easily compute that
In particular, if we define the twisted convolution operator
then the square is also a twisted convolution operator
and the twisted convolution kernel can be computed as
For general bilinear forms , this twisted convolution is just as messy as
is. But if we take the specific bilinear form
then for
and
for
, and the above twisted convolution simplifies to
and now is very simple:
Thus the only eigenvalues of are
and
. The matrix
is entrywise dominated by
in the sense of (1), and in particular has trace zero; thus the
and
eigenvalues must occur with equal multiplicity, so in particular the
eigenvalue occurs with multiplicity
since the matrix has dimensions
. This establishes Proposition 2.
Remark 4 Twisted convolution
is actually just a component of ordinary convolution, but not on the original group
; instead it relates to convolution on a Heisenberg group extension of this group. More specifically, define the Heisenberg group
to be the set of pairs
with group law
and inverse operation
(one can dispense with the negative signs here if desired, since we are in characteristic two). Convolution on
is defined in the usual manner: one has
for any
. Now if
is a function on the original group
, we can define the lift
by the formula
and then by chasing all the definitions one soon verifies that
for any
, thus relating twisted convolution
to Heisenberg group convolution
.
Remark 5 With the twisting by the specific bilinear form
given by (2), convolution by
and
now anticommute rather than commute. This makes the twisted convolution algebra
isomorphic to a Clifford algebra
(the real or complex algebra generated by formal generators
subject to the relations
for
) rather than the commutative algebra more familiar to abelian Fourier analysis. This connection to Clifford algebra (also observed independently by Tom Mrowka and by Daniel Matthews) may be linked to the exterior algebra interpretation of the argument in the recent preprint of Karasev mentioned above.
Remark 6 One could replace the form (2) in this argument by any other bilinear form
that obeyed the relations
and
for
. However, this additional level of generality does not add much; any such
will differ from
by an antisymmetric form
(so that
for all
, which in characteristic two implied that
for all
), and such forms can always be decomposed as
, where
. As such, the matrices
and
are conjugate, with the conjugation operator being the diagonal matrix with entries
at each vertex
.
Remark 7 (Added later) This remark combines the two previous remarks. One can view any of the matrices
in Remark 6 as components of a single canonical matrix
that is still of dimensions
, but takes values in the Clifford algebra
from Remark 5; with this “universal algebra” perspective, one no longer needs to make any arbitrary choices of form
. More precisely, let
denote the vector space of functions
from the hypercube to the Clifford algebra; as a real vector space, this is a
dimensional space, isomorphic to the direct sum of
copies of
, as the Clifford algebra is itself
dimensional. One can then define a canonical Clifford adjacency operator
on this space by
where
are the generators of
. This operator can either be identified with a Clifford-valued
matrix or as a real-valued
matrix. In either case one still has the key algebraic relations
and
, ensuring that when viewed as a real
matrix, half of the eigenvalues are equal to
and half equal to
. One can then use this matrix in place of any of the
to establish Theorem 1 (noting that Schur’s test continues to work for Clifford-valued matrices because of the norm structure on
).
To relate
to the real
matrices
, first observe that each point
in the hypercube
can be associated with a one-dimensional real subspace
(i.e., a line) in the Clifford algebra
by the formula
for any
(note that this definition is well-defined even if the
are out of order or contain repetitions). This can be viewed as a discrete line bundle over the hypercube. Since
for any
, we see that the
-dimensional real linear subspace
of
of sections of this bundle, that is to say the space of functions
such that
for all
, is an invariant subspace of
. (Indeed, using the left-action of the Clifford algebra on
, which commutes with
, one can naturally identify
with
, with the left action of
acting purely on the first factor and
acting purely on the second factor.) Any trivialisation of this line bundle lets us interpret the restriction
of
to
as a real
matrix. In particular, given one of the bilinear forms
from Remark 6, we can identify
with
by identifying any real function
with the lift
defined by
whenever
. A somewhat tedious computation using the properties of
then eventually gives the intertwining identity
and so
is conjugate to
.
— 1. The Chung, Furedi, Graham, and Seymour example —
The paper of by Chung, Furedi, Graham, and Seymour gives, for any , an example of a subset
of
of cardinality
for which the maximum degree of
restricted to
is at most
, thus showing that Theorem 1 cannot be improved (beyond the trivial improvement of upgrading
to
, because the maximum degree is obviously a natural number).
Define the “Möbius function” to be the function
for . This function is extremely balanced on coordinate spaces. Indeed, from the binomial theorem (which uses the convention
) we have
More generally, given any index set of cardinality
, we have
Now let be a partition of
into disjoint non-empty sets. For each
, let
be the subspace of
consisting of those
such that
for all
. Then for any
, we have
and the right-hand side vanishes if and equals
when
. Applying the inclusion-exclusion principle, we conclude that
and thus also (assuming )
so that
Thus, if denotes the set of those
with
, together with those
with
, then
has to have two more elements than its complement
, and hence
has cardinality
.
Now observe that, if with
and
, then
, and if
then
unless
. Thus in this case the total number of
for which
is at most
. Conversely, if
with
and
, then
, and for each
there is at most one
that will make
lie in
. Hence in this case the total number of
for which
is at most
. Thus the maximum degree of the subgraph of
induced by
is at most
. By choosing the
to be a partition of
into
pieces, each of cardinality at most
, we obtain the claim.
Remark 8 Suppose that
is a perfect square, then the lower bound here exactly matches the upper bound in Theorem 1. In particular, the
minor of the matrix
must have an eigenvector of eigenvalue
. Such an eigenvector can be explicitly constructed as follows. Let
be the vector defined by setting
for some
,
, and
for some
,
,
, and
for all other
(one easily verifies that the previous types of
lie in
). We claim that
for all
. Expanding out the left-hand side, we wish to show that
for all
.
First suppose that
is of the form (3). One checks that
lies in
precisely when
for one of the
, in which case
Since
, this simplifies using (3) as
giving (5) in this case. Similarly, if
is of the form (4), then
lies in
precisely when
, in which case one can argue as before to show that
and (5) again follows. Finally, if
is not of either of the two forms (3), (4), one can check that
is never of these forms either, and so both sides of (5) vanish.
The same analysis works for any of the other bilinear forms
in Remark 6. Using the Clifford-valued operator
from Remark 7, the eigenfunction
is cleaner; it is defined by
when
is of the form (3), and
when
is of the form (4), with
otherwise.
— 2. From induced subgraph bounds to the sensitivity conjecture —
On the hypercube , let
denote the functions
The monomials in
are then the characters of
, so by Fourier expansion every function
can be viewed as a polynomial in the
(with each monomial containing at most one copy of
; higher powers of each
are unnecessary since
. In particular, one can meaningfully talk about the degree of a function
. Observe also that the Möbius function
from the preceding section is just the monomial
.
Define the sensitivity of a Boolean function
to be the largest number for which there is an
such that there are at least
values of
with
. Using an argument of Gotsman and Linial, we can now relate the sensitivity of a function to its degree:
Corollary 9 (Lower bound on sensitivity) For any boolean function
, one has
.
Proof: Write . By permuting the indices, we may assume that
contains a non-trivial multiple of the monomial
. By restricting to the subspace
(which cannot increase the sensitivity), we may then assume without loss of generality that
. The Fourier coefficient of
is just the mean value
of times the Möbius function
, so this mean value is non-zero. This means that one of the sets
or
has cardinality at least
. Let
denote the larger of these two sets. By Theorem 1, there is an
such that
for at least
values of
; since
, this implies that
for at least
values of
, giving the claim.
The construction of Chung, Furedi, Graham, and Seymour from the previous section can be easily adapted to show that this lower bound is tight (other than the trivial improvement of replacing by
).
Now we need to digress on some bounds involving polynomials of one variable. We begin with an inequality of Bernstein concerning trigonometric polynomials:
Lemma 10 (Bernstein inequality) Let
be a trigonometric polynomial of degree at most
, that is to say a complex linear combination of
for
. Then
Observe that equality holds when or
. Specialising to linear combinations of
, we obtain the classical Bernstein inequality
for complex polynomials of degree at most .
Proof: If one is willing to lose a constant factor in this estimate, this bound can be easily established from modern Littlewood-Paley theory (see e.g., Exercise 52 of these lecture notes). Here we use an interlacing argument due to Boas. We first restrict to the case when has real coefficients. We may normalise
. Let
be a real parameter in
. The trigonometric polynomial
alternately takes the values
and
at the
values
. Thus the trigonometric polynomial
alternates in sign at these values, and thus by the intermediate value theorem has a zero on each of the intervals
. On the other hand, a trigonometric polynomial
of degree at most
can be expressed by de Moivre’s theorem as
times a complex polynomial in
of degree at most
, and thus has at most
zeroes. Thus we see that
has exactly one zero in each
. Furthermore, at this zero, the derivative
of this function must be positive if
is increasing on this interval, and negative if
is decreasing on this interval. In summary, we have shown that if
and
are such that
, then
has the same sign as
. By translating the function
, we also conclude that if
and
are such that
for some
, then
has the same sign as
.
If , then we can find
such that
and
is positive, and we conclude that
; thus we have the upper bound
A similar argument (with now chosen to be negative) similarly bounds
. This gives the claim for real-valued trigonometric polynomials
. (Indeed, this argument even gives the slightly sharper bound
.)
To amplify this to complex valued polynomials, we take advantage of phase rotation invariance. If is a complex trigonometric polynomial, then by applying Bernstein’s inequality to the real part
we have
But then we can multiply by any complex phase
and conclude that
Taking suprema in , one obtains the claim for complex polynomials
.
The analogue of Bernstein’s inequality for the unit interval is known as Markov’s inequality for polynomials:
Lemma 11 (Markov’s inequality for polynomials) Let
be a polynomial of degree
. Then
This bound is sharp, as is seen by inspecting the Chebyshev polynomial , defined as the unique polynomial giving the trigonometric identity
Differentiating (6) using the chain rule, we see that
the right-hand side approaches as
, demonstrating that the
factor here is sharp.
Proof: We again use an argument of Boas. We may normalise so that
The function is a trigonometric polynomial of degree at most
, so by Bernstein’s inequality and the chain rule we have
for all , and thus
for all . This already gives Markov’s inequality except in the edge regions
(since
). By reflection symmetry, it then suffices to verify Markov’s inequality in the region
.
From (6), the Chebyshev polynomial attains the values
alternately at the
different points
. Thus, if
, the polynomial
changes sign at least
times on
, and thus must have all
zeroes inside this interval by the intermediate value theorem; furthermore,
of these zeroes will lie to the left of
. By Rolle’s theorem, the derivative
then has all
zeroes in the interval
, and at least
of these will lie to the left of
. In particular, the derivative
can have at most one zero to once to the right of
.
Since ,
is positive at
, and hence positive as
since there are no zeroes outside of
. Thus the leading coefficient of
is positive, which implies the same for its derivative
. Thus
is positive when
.
From (9) one has , hence by (7) we see that
is also positive at
. Thus
cannot become negative for
, as this would create at least two zeroes to the right of
. We conclude that in this region we have
From (7) we have , and the claim follows.
Remark 12 The following slightly shorter argument gives the slightly weaker bound
. We again use the normalisation (8). By two applications of Bernstein’s inequality, the function
has first derivative bounded in magnitude by
, and second derivative bounded in magnitude by
. As this function also has vanishing first derivative at
, we conclude the bounds
and thus by the chain rule
For
, one easily checks that the right-hand side is at most
, giving the claim.
This implies a result of Ehlich-Zeller and of Rivlin-Cheney:
Corollary 13 (Discretised Markov inequality) Let
be a polynomial of degree
. If
then we have
.
Proof: We use an argument of Nisan and Szegedy. Assume for sake of contradiction that , so in particular
. From the fundamental theorem of calculus and the triangle inequality one has
By a rescaled and translated version of Markov’s inequality we have
which when inserted into the preceding inequality gives after some rearranging
and then after a second application of (11) gives
Comparing with (10), we conclude that
and the claim follows after some rearranging.
Nisan and Szegedy observed that this one-dimensional degree bound can be lifted to the hypercube by a symmetrisation argument:
Corollary 14 (Multidimensional Markov inequality bound) Let
be such that
and
for
. Then
Proof: By averaging over all permutations of the indices (which can decrease the degree of
, but not increase it), we may assume that
is a symmetric function of the inputs
. Using the Newton identities, we can then write
for some real polynomial of degree at most
, where
is the Hamming length of . By hypothesis,
,
, and
, hence by the mean-value theorem
. Applying Corollary 13 with
, we obtain the claim.
Define the block sensitivity of a Boolean function
to be the largest number for which there is an
such that there are at least
disjoint subsets
of
with
for
. We have
Theorem 15 (Sensitivity conjecture) One has
More precisely, the sensitivity conjecture of Nisan and Szegedy asserted the bound ; Huang’s result thus gives explicit values for the exponents. It is still open whether the exponent
in this theorem can be improved; it is known that one cannot improve it to below
, by analysing a variant of the Chung-Furedi-Graham-Seymour example (see these notes of Regan for details). Proof: The lower bound for
is immediate from the definitions, since the sensitivity arises by restricting the
in the definition of block sensitivity to singleton sets. To prove the upper bound, it suffices from Proposition 9 to establish the bound
Let . By hypothesis, there are
and
disjoint subsets
of
such that
for
. We may normalise
,
and
. If we then define the pullback boolean function
by the formula
then it is easy to see that ,
, and
for
. The claim now follows from Corollary 14.
Remark 16 The following slightly shorter variant of the argument lets one remove the factor
. Let
be as above. We again may normalise
and
. For any
, let
be iid Bernoulli random variable that equal
with probability
and
with probability
. The quantity
is a trigonometric polynomial of degree at most
that is bounded in magnitude by
, so by two applications of Bernstein’s inequality
On the other hand, for small
, the random variable
is equal to zero with probability
and equal to each
with probability
, hence
and hence
. Combining these estimates we obtain
and hence
.
Remark 17 The sensitivity
of a Boolean function
can be split as
, where
is largest number for which there is an
such that there are at least
values of
with
. It is not difficult to use the
case of Remark 3 to improve Corollary 9 slightly to
. Combining this with the previous remark, we can thus improve the upper bound in Theorem 15 slightly to
60 comments
Comments feed for this article
26 July, 2019 at 7:18 pm
YI HAO
Impressive. Solving harder problems with lesser mathematics.
27 July, 2019 at 10:17 pm
Anonymous
Bravo! More pride for UCLA
29 July, 2019 at 2:57 pm
Anonymous
Your “lesser mathematics” comment implies the use of simpler mathematical methods to prove conjectures or to solve problems. And I strongly agree with you…
Relevant Reference Link:
https://en.wikipedia.org/wiki/Hilbert%27s_twenty-fourth_problem
26 July, 2019 at 7:32 pm
Anonymous
“…gave a of a long-standing problem…”
could be revised to (you forgot to add, “proof” above)
“…gave a proof of a long-standing problem” in your first statement above.
[Corrected, thanks – T.]
26 July, 2019 at 8:31 pm
L
This is more illuminating than the original proof. Thank you.
29 July, 2019 at 5:51 pm
Anonymous
Please don’t forget to add the illuminating questions/comments/explanations… of Prof. T. Tao, Prof. Huang and others too. We are very fortunate! Thank you very much! :-)
27 July, 2019 at 1:07 am
Spectral proofs of theorems on the boolean hypercube | Anurag's Math Blog
[…] discussions on Huang’s proof of the sensitivity conjecture, see the following blog posts: Terry Tao, Gil Kalai, Scott Aaronson, Ken […]
27 July, 2019 at 1:20 am
Anurag Bishnoi
Interlacing methods in general have been used before, see for example this 1995 survey of Haemers: https://www.sciencedirect.com/science/article/pii/0024379595001992, where it goes beyond the Cauchy interlacing inequalities.
Also, using a pseudo-adjacency matrices instead of just the adjacency matrix, was already used by Rick Wilson in his 1984 paper where he proved the general Erdős-Ko-Rado theorem:
https://link.springer.com/article/10.1007/BF02579226
27 July, 2019 at 2:01 am
Hao Huang
Terry: thanks for this great expository article! I have been following your amazing blog for 10+ years (well before taking your 254 sequence at UCLA), and it is certainly a great honor to see my name mentioned here:)
Let me try to give an alternative explanation of where the pseudo-adjacency matrix comes from, from a more combinatorial perspective. Recall that for the proof to work, the pseudo-adjacency matrix needs to have entries between -1 and 1, and its 2^{n-1}-th largest eigenvalue must be \sqrt{n}. Hadamard’s inequality tells us that the determinant of such matrix is at most the product of the lengths of its row vectors, which is upper bounded by \sqrt{n}^{2^{n-1}}. On the other hand, the constraint on the 2^{n-1}-th largest eigenvalue requires that all eigenvalues have absolute value at least \sqrt{n}, since the hypercube is bipartite and its spectrum is symmetric about 0. Therefore the determinant, which is the product of eigenvalues, is at least \sqrt{n}^{2^{n-1}}.
Now for the Hadamard inequality to be tight, the rows must be pairwise orthogonal. By decomposing the matrix into 2 by 2 blocks (each corresponding to a (n-1)-dimensional subcube), it is not hard to see that each (n-1)-cube also needs to have a good signing. And for example if we fix the signing of the top (n-1)-cube, and an arbitrary signing of the vertical perfect matching, then the signing of the bottom cube is fixed. When choosing all vertical edges to be positive, this is exactly the recursive construction in the paper.
I am quite interested to see that to what extent the same signing exists for other very symmetric graphs. Moreover, it seems that certain signing “compresses” the positive (negative) spectrum of the adjacency matrix, maybe this recursively constructed matrix is no more than composing a number of “compressing signing”?
29 July, 2019 at 11:41 am
Gil Kalai
Dear Hao, what do you mean by the last sentence?
29 July, 2019 at 4:59 pm
Terence Tao
I can take a stab at this. The unsigned adjacency matrix
has a rather “spread out” spectrum; the
eigenvalues are all the signed sums
of
signs. These eigenvalues mostly cluster within a distance
from the origin but the spectral radius is huge (equal to
). With the right signing of the matrix, however, the spectrum is “compressed” down to just
and
, which is what is needed for this application – very roughly speaking, the
signs
have somehow become “orthogonal” to each other (perhaps in some Clifford algebra sense) and hence their sums now always have “magnitude
” (by some sort of “Pythagorean theorem”). (Any symmetric signing of the matrix will preserve the first two moments, so this is the best compression of the spectral radius that one can hope for.) The compression is achieved by an iterative method in which the
spectrally compressed matrix is constructed as a sort of skew tensor product of the
matrix and an explicit
matrix. So the proof can be ultimately boiled down to the properties of explicit
matrices (another comment on this blog post reached a similar conclusion, where the
matrices were Pauli matrices). This reminds me a bit of how hypercontractivity type inequalities can often be proven by iterating an elementary inequality involving a small number real variables (as few as two, in some cases), though in those cases the iteration is fairly straightforward and does not involve any “skewness”.
29 July, 2019 at 5:29 pm
Terence Tao
One can also interpret everything in terms of eigenvalues of sums of Hermitian matrices, a topic I am personally rather fond of :). If
are Hermitian matrices that each have eigenvalues in
, then the sum
can have eigenvalues anywhere in the interval between
and
(though if one requires the
to commute, then the eigenvalues have to be integers of the same parity as
). However, if one instead requires that the
all anticommute, then the spectrum of the sum
has now been compressed into the set
, because this matrix now squares to
times the identity matrix (one can interpret the anticommutator identity
as a sort of “orthogonality” between
and
). This is the matrix analogue of the vector fact that the sum of
unit vectors can have magnitude anywhere between
and
in general, but is forced to have magnitude
if all the unit vectors are orthogonal to each other. So the game is all about weighting the graph to make the different edges anticommute with each other, which is what one sees in all the different interpretations of the argument appearing in this blog.
29 July, 2019 at 9:48 pm
Anonymous
If this is the matrix analogue for orthogonality, is it follows from a more general underlying matrix analogue for inner product?
30 July, 2019 at 9:23 am
Terence Tao
I think the role of the inner product here is played by the anticommutator
of two Hermitian matrices
: it is symmetric, bilinear, and is “positive definite” in the sense that
is positive definite when
is non-zero. The one catch is that the anticommutator is matrix-valued rather than scalar valued. But it is related to the more usual scalar (Frobenius) inner product
by the identity
. This analogy helps explain why anticommutation seems to be a matrix analogue of orthogonality, which is a nice insight that I was not consciously aware of until looking at this recent work.
Another relationship between the anticommutator and the Frobenius inner product is the cyclic symmetry
(and similarly for other permutations, since both the anticommutator and Frobenius inner product are symmetric). In particular, if
anticommute, then not only are they orthogonal to each other in the Frobenius sense, but also
is Frobenius orthogonal to
(and
is similarly Frobenius orthogonal to
) for any matrix
. So this is another way to connect anticommutation with orthogonality.
The analogy between anticommutators and inner products seems to break down a bit once one goes beyond the Hermitian setting though; one either has to abandon positive definiteness, or abandon (complex) bilinearity. But for real non-symmetric matrices one might possibly be able to work with
as a substitute for the anticommutator.
1 August, 2019 at 12:35 am
T
What operators preserve this inner product? Is there an analog of orthogonality here?
1 August, 2019 at 9:45 am
Terence Tao
The anticommutator
on Hermitian matrices is equivariant with respect to unitary conjugation:
. (Given that the anticommutator is matrix-valued rather than scalar-valued, equivariance is arguably a more natural notion of symmetry here than invariance.) I suspect these are basically the only (invertible) linear operations on Hermitian matrices that make the anticommutator equivariant.
1 August, 2019 at 5:46 pm
T
Coordinatization of some space?
29 July, 2019 at 3:00 pm
Ferdinand Ihringer
Today this paper on the arXiv appeared [1] (by my old colleagues of the Discrete Math Group in Regina) which reminded me that there is a research area devoted towards the study of such weighted adjacency matrices. The are called multiplicity partitions and in this particular case it is an example for a multiplicity partition of the form [ 2^{n-1}, 2^{n-1} ] of the hypercube.
Now for the fun of it I was looking at some older results on this. It turns out that your matrix was already constructed in [2], Theorem 6.7 and Corollary 6.9, as an example for a graph with only two different eigenvalues. (The scaling is different to your matrix to keep the eigenvalues restricted to -1 and 1, but I guess that one can ignore this fact.)
[1] https://arxiv.org/abs/1907.11328
[2] https://arxiv.org/abs/1304.1205
29 July, 2019 at 3:22 pm
Terence Tao
Scott Aaronson noted in his own blog post on this result that this argument was a great example of the empirical
phenomenon in mathematical proofs, and now I think the case is even stronger; the result can now be deduced in a paragraph from existing results in the literature, using a method that was also already in the literature, but the task of finding that short proof ab nihilo was highly nontrivial!
Another example of this is Dvir’s proof of the finite field Kakeya conjecture. I’m still kicking myself for coming up with half of that proof six years earlier with a coauthor (see Proposition 8.1 of this paper) and completely failing to see that the conjecture could then be resolved with an additional paragraph of effort [amusingly, in both cases the “additional effort” consists of “a tiny bit of undergraduate linear algebra”]. Of course, this doesn’t detract at all from Dvir’s accomplishment, instead it is another example of the empirical
phenomenon. Perhaps when attributing credit to a result, one has to make a distinction between the “NP-credit” (the credit one would assign based on the optimised argument from the literature that one can construct after seeing a proof) for a result and the “P-credit” (the credit one would assign to each mathematician for their contribution in advancing the explicit state of knowledge on the problem [as opposed to what can be implicitly extrapolated from the total corpus of literature with the benefit of hindsight] closer to a solution (or pushing beyond a mere solution of the problem and gaining further insight and understanding around the problem and its connection with the rest of mathematics)).
29 July, 2019 at 3:56 pm
Bizarre.
This is a bizarre conclusion. Is the analogy ‘the problem is encodable somehow with a boolean formula with n variables with say n=100 and we are are finding some version of satisfiability’ and if so did it take 2^(c.100) time?
29 July, 2019 at 4:00 pm
Bizarre.
Then the evidence is towards P=NP since apparently it did not take that long. He is probably the only a handful who persisted on this graph theoretic approach and that too for 5 years (and/but that too only for 5 years).
29 July, 2019 at 4:17 pm
Anonymous
I think the analogy is that it often appears more difficult to generate a correct solution than to check or justify or reformulate the solution with the benefit of hindsight.
29 July, 2019 at 4:40 pm
Terence Tao
Pretty much, yes (though I would replace the exponential bound here with something more empirically realistic, such as a superpolynomial or even superquadratic bound). The proof already has been encoded into a single tweet, which is very roughly of the same order of difficulty as satisfying a boolean formula with
variables, and can now be verified using results in the literature in less than an hour; yet, as you note, it took at least 5 years (=43800 hours) to locate this proof (actually a more accurate metric here would be the total amount of expert person-hours expended on the problem, which is more difficult to compute but similarly substantial, as Scott pointed out in his blog post).
29 July, 2019 at 6:07 pm
Bizarre.
Don’t know about expert man hours. None of the experts suggested graph theory approach even though the strategy was pointed by Nisan in Scott`s MO problem years back. More realistic might be an year of intense effort (about 5000 hours) and also checkable by whom is 1/2 half (again by experts). If at all the moral seems to be be stay convinced in your belief on the conjecture (whether conjecture was correct) and be lucky in your toolkit.
29 July, 2019 at 9:22 pm
Alex Jones
I think this P vs NP distinction strengthens Timothy Gowers’s claim that computers can have a meaningful impact on generating new mathematical proofs. It seems very reasonable (to me) that a smart computer can get that ‘NP-credit’ that Terry commented about above.
30 July, 2019 at 12:38 pm
Bizarre.
FLT took 350 years. Assume 10 mathematicians each worked intensely 4000 hours everyday for 400 years then we have 16000000 person hours. 16000000/50 is roughly 300000. Would you agree 300000 bits is a reasonable measure to encode all of algebraic number theory, scheme theory, arithmetic geometry necessary to encode the proof of FLT notwithstanding the amount of essential information that was written down on these topics in 400 years? Just asking because it is tempting to understand the complexity of mathematics because it is one evidence shown in favor of P not NP inspite of the fact SAT has no evidence of being outside linear time as far as we know.
29 July, 2019 at 4:24 pm
Ferdinand Ihringer
Recently, Hajime Tanaka and I slightly modified a proof by Frankl from the 1980s on the independence number of the orthogonality graph [1] (the vertices are the hypercube and adjacency is Hamming distance n/2). It is all about writing down the right matrix. Frankl took an argument by Delsarte and showed it for
, p an odd prime. Hajime and I added a divided by 2 into Frankl’s argument to cover p even.
I am sure that the complete solution will also be easy. Just that someone needs to find the right matrix and apply the right variation of one of the known rank arguments.
[1] https://arxiv.org/abs/1901.04860
30 July, 2019 at 12:29 am
Anonymous
A reasonable “standard” measure for the complexity of a mathematical problem may be defined as a the smallest complexity (with respect to some fixed “standard” complexity measure) over all its possible proofs.
14 November, 2021 at 5:34 pm
Anonymous
Deus ex machina
27 July, 2019 at 4:46 am
Huangs’ Breakthrough, Cvetković’s Bound, Godsil’s Question, and Sinkovic’s Answer | Ratio Bound – A Combinatorics Blog
[…] ” and everyone is blogging about it (only a small selection): Anurag Bishnoi, Gil Kalai, Terry Tao, Fedya Petrov. Here I am jumping on this […]
27 July, 2019 at 11:00 am
Amazing: Hao Huang Proved the Sensitivity Conjecture! | Combinatorics and more
[…] and many other share: Huang’s new method is very very promising. July 27: A great expository post on WN (What’s new) where Terry Tao presents the proof and describe a connection with Clifford […]
27 July, 2019 at 7:59 pm
kodlu
Very interesting… There are three formulae which give the “formula does not parse” error, and I have not been able to tell exactly why, since they look the same as other formulae which parse OK. I’ve tried both safari and chrome on iOS which give the same errors:
Section 1:
Then for any {J \subset \{1,\dots,k\}}, we have [the formula below doesn’t parse, maybe a missing }?]
\displaystyle \sum_{x \in \bigcap_{j \in J} V_j} \mu(x) = (-1)^{1_{n= \sum_{j \in J}|I_i|},
Remark 6:
Let {f \in \ell^2(E)} be the vector defined by setting [the formula below doesn’t parse, maybe a missing }?]
\displaystyle f(x) := (-1)^{\sum_{1 \leq j < j' \leq k} B( e_{i_j}, e_{i_{j'}} )
and further on
Since {B(e_{i_{j_0}}, e_{i_j}) = B(e_{i_j}, e_{i_{j_0}) + 1}, this simplifies using (3) as
The above doesn't parse either.
[Corrected, thanks – T.]
27 July, 2019 at 8:41 pm
S
It is a pity that Oleksiy Klurman and Cosmin Pohoata are not receiving more credit since the method in Huang’s paper appeared first in a joint paper with them, https://arxiv.org/abs/1812.05989
27 July, 2019 at 10:42 pm
T
As Anurag mentioned in his comment, the use of pseudo-adjacency matrix is not new at all. The difficult step is to realize that it can be applied to this specific cube problem. You could also say that it is a pity that Wilson is not receiving more credit.
28 July, 2019 at 2:12 am
L
In this case the concerned are coauthors however the related paper does not appear as a reference in the celebrated proof paper. May be the author of the proof had a different path to realization.
29 July, 2019 at 2:34 pm
Ferdinand Ihringer
Or Cvetkovic (if we want to find earliest written precedent). His PhD from 1971 already had it. There are other theses from the 1970s with similar observations (maybe Lovasz’s, Delsarte’s and Haemer’s are the ones which I know). These methods are also often rediscovered as they are not too complicated linear algebra.
27 July, 2019 at 10:54 pm
S
Sure, but Wilson is retired whereas Oleksiy Klurman and Cosmin Pohoata are young mathematicians who are Huang’s coauthors.
28 July, 2019 at 8:44 am
Anonymous
grind up old mathematicians to feed the young
27 July, 2019 at 11:09 pm
La conjetura de la sensibilidad ha sido demostrada - La Ciencia de la Mula Francis
[…] Los interesados en los detalles de la demostraciĂłn, que es fácil de entender si ya se conocĂa su enunciado, asĂ como los conceptos de subgrafo inducido y autovalor, pueden disfrutarla en Hao Huang, «Induced subgraphs of hypercubes and a proof of the Sensitivity Conjecture,» arXiv:1907.00847 [math.CO] (01 Jul 2019). Una presentaciĂłn divulgativa para matemáticos en Scott Aaronson, «Sensitivity Conjecture resolved,»Â Shtetl-Optimized, 02 Jul 2019; sobre la relaciĂłn entre el grado y la sensibilidad recomiendo Lance Fortnow, «Degree and Sensitivity,» Computational Complexity, 11 Jul 2019; las consecuencias del nuevo teorema no han tardado en llegar, como R. J. Lipton, K. W. Regan, «Discrepancy Games and Sensitivity,»Â Gödel’s Lost Letter and P=NP, 25 Jul 2019; y Terence TaoTwisted convolution and the sensitivity conjecture,»Â What’s new, 26 Jul 2019. […]
28 July, 2019 at 6:26 am
gh
I wonder if there is some “homology” involved. Namely, for
the standard adjacency matrix of the hypercube graph,
counts the paths of length 2 from vertex
to vertex
, and hence
because each vertex has n neighbors and
because for
there are two paths (a square just as in the case (0,0) to (1,1).
considered in the proof similarly has
(paths of length 2 returning to
each contributing
) and
) because the contribution of the two paths from
to
now cancel each other out. So “the square spanned by
and
is filled in”.
Now the pseudo-adjacency matrix
28 July, 2019 at 9:42 am
Terence Tao
I’ve added a new remark (Remark 6) in which the proof is reformulated without reference to a bilinear form
or any arbitrary choice of signs, and the controlling matrix
is now canonically defined using the Clifford algebra generated by
with the relations
, which abstract the two properties that make the matrix
square to
. (Basically, each edge
is now weighted by a Clifford generator
, rather than by a sign
.) This is in some sense the “universal” version of Huang’s argument. Perhaps these Clifford generators
also have some homological interpretation.
28 July, 2019 at 7:49 am
Anonymous
This short proof seem to be “from the Book” !
28 July, 2019 at 9:12 am
Michael
Shortly before Remark 6, there is ‘Conversely, if x not in bigcap…’ should be bigcup
Near the end of the proof of Corollary 10, there is eqreef{disco} should be eqref{disco} ; then in the displayed equation, ‘frac’ is spelled out; then a paragraph with just the letter b.
[Corrected, thanks – T.]
28 July, 2019 at 8:00 pm
L
Most studies on spectral graph theory seem to be related to geometry (such as embedding type) and looking at Clifford algebra and related avatars there seems to be more essential reasoning in terms of purely geometric terms. Is there a way to look at things purely in geometric terms and interpret the result geometrically?
28 July, 2019 at 10:10 pm
Piotr Sniady
The explanation of Roman Karasev of the structure of the matrix
can be rephrased in a slightly more concrete language as follows.
It will be convenient to identify the square matrices of size
with the tensor power of square matrices of size
.
Let
and
be Pauli matrices and let
be the identity matrix.
Now let
,
,
,
;
contains exactly one factor
on
-th position, all factors before it are equal to
and all factors after it are equal to the identity matrix
).
…
(
Note that the matrices
described above appear naturally in the context of quantum mechanics for the following reason: since Pauli matrices anticommute:
, it follows that the matrices
anticommute as well:
whenever
. Also,
is the identity matrix.
Luckily, we have that
is a (signed) adjacency matrix for the graph we are interested in.
From the anticommutation of the matrices
it follows immediately that
.
29 July, 2019 at 7:51 am
Fiodr
I think
and
should be switched in the definition of
.
29 July, 2019 at 8:15 am
Piotr Sniady
Oops, you are right, thanks!
[I have now swapped
and
in the original comment – T.]
30 July, 2019 at 6:51 am
MikeR
This exposition by Wenjie Zheng is crystal clear. Not written at the same level, but definitely comprehensible.
http://www.zhengwenjie.net/huanghao/
30 July, 2019 at 7:20 am
Anonymous
Simplicity is clear, wonderful, and essential! Thank you!
30 July, 2019 at 10:08 am
Karl Mahlburg
Terry,
Thanks for an extremely informative post! Huang’s proof is extraordinarily clean and concise, but I also appreciate all of the background material that you have provided.
I believe that I noticed a few minor typos, which are listed below.
1. The fourth displayed equation in Section 1 should be
This follows from applying the third displayed equation.
2. Near the end of the proof of Lemma 11 is the statement: “From (9) one has
.” However, I believe this should instead be
The point is then that by (7)
so in all
3. In the proof of Theorem 15, the sentence before the first displayed equation says “it suffices from (9)”. However, the link actually points to Corollary 9, which I believe is the correct reference.
4. Towards the end of Remark 16, there is the statement “and equal to each
with probability
“. I think this should instead be “and equal to each
“.
[Corrected, thanks – T.]
1 August, 2019 at 4:26 am
chorasimilarity
The relation with the Heisenberg group suggests there might be a reformulation in terms of symplectic capacities?
21 August, 2019 at 4:19 pm
Zachary Kyle Knutson
Dear Mathematicians,
This goes to show that even Terrence Tao could be wrong, Hao Huang is a fraud and he copied my work I posted on research gate he is also concurrently wrong… in part, his answer is implied as wrong… Due to his misuse of the mechanisms involved. I would write it up if it didn’t make my stomach turn that somebody would just openly steal my shit or claim money over its false conception…It is not 2^(N-1)+1 it is 2^(n-1)-1 the minus is key and Terrance is probably the only other person on the earth beside the few of you that understands why…
22 August, 2019 at 11:38 pm
bizarre.
Highest upvoted comment truly bizarre.
24 August, 2019 at 11:23 pm
bizarre.
Self-voting perhaps.
24 August, 2019 at 6:19 am
Zachary Kyle Knutson
Thanks to you all lol my faith in humanity is restored…
24 August, 2019 at 6:26 am
Zachary Kyle Knutson
#ResearchGate me…
24 August, 2019 at 6:27 am
Zachary Kyle Knutson
Because People have copied me in physics and engineering aswell…
24 November, 2019 at 6:36 am
Zachary Kyle Knutson
Research my work those (Those who thumbed up Hao Huang cannot see) Research gate my work… I will sleep at night…
14 November, 2021 at 11:06 am
Ben Green
Terry, unless I am mistaken I think it may need to be explicitly remarked that twisted convolution with $\mu$ is diagonalisable. I think this is so because the kernel $(-1)^{B(y,x-y)} \mu(x-y)$ is symmetric, because of the particular choice of $B$ which has $B(e_i, e_i) = 0$. (I think you may be implicitly using this symmetry when you refer to using the Cauchy interlacing theorem.)
I’m not sure there is any reason to expect that twisted convolution by a function is diagonalisable in general, is there?
14 November, 2021 at 3:58 pm
Terence Tao
Fair enough, twisted convolution need not be diagonalisable in general, but once one knows that the matrix squares to a constant multiple of the identity one gets diagonalisability at once from the Jordan normal form.