You are currently browsing the tag archive for the ‘Littlewood-Offord problem’ tag.

One of the most notorious problems in elementary mathematics that remains unsolved is the Collatz conjecture, concerning the function ${f_0: {\bf N} \rightarrow {\bf N}}$ defined by setting ${f_0(n) := 3n+1}$ when ${n}$ is odd, and ${f_0(n) := n/2}$ when ${n}$ is even. (Here, ${{\bf N}}$ is understood to be the positive natural numbers ${\{1,2,3,\ldots\}}$.)

Conjecture 1 (Collatz conjecture) For any given natural number ${n}$, the orbit ${n, f_0(n), f^2_0(n), f^3_0(n), \ldots}$ passes through ${1}$ (i.e. ${f^k_0(n)=1}$ for some ${k}$).

Open questions with this level of notoriety can lead to what Richard Lipton calls “mathematical diseases” (and what I termed an unhealthy amount of obsession on a single famous problem). (See also this xkcd comic regarding the Collatz conjecture.) As such, most practicing mathematicians tend to spend the majority of their time on more productive research areas that are only just beyond the range of current techniques. Nevertheless, it can still be diverting to spend a day or two each year on these sorts of questions, before returning to other matters; so I recently had a go at the problem. Needless to say, I didn’t solve the problem, but I have a better appreciation of why the conjecture is (a) plausible, and (b) unlikely be proven by current technology, and I thought I would share what I had found out here on this blog.

Let me begin with some very well known facts. If ${n}$ is odd, then ${f_0(n) = 3n+1}$ is even, and so ${f_0^2(n) = \frac{3n+1}{2}}$. Because of this, one could replace ${f_0}$ by the function ${f_1: {\bf N} \rightarrow {\bf N}}$, defined by ${f_1(n) = \frac{3n+1}{2}}$ when ${n}$ is odd, and ${f_1(n)=n/2}$ when ${n}$ is even, and obtain an equivalent conjecture. Now we see that if one chooses ${n}$ “at random”, in the sense that it is odd with probability ${1/2}$ and even with probability ${1/2}$, then ${f_1}$ increases ${n}$ by a factor of roughly ${3/2}$ half the time, and decreases it by a factor of ${1/2}$ half the time. Furthermore, if ${n}$ is uniformly distributed modulo ${4}$, one easily verifies that ${f_1(n)}$ is uniformly distributed modulo ${2}$, and so ${f_1^2(n)}$ should be roughly ${3/2}$ times as large as ${f_1(n)}$ half the time, and roughly ${1/2}$ times as large as ${f_1(n)}$ the other half of the time. Continuing this at a heuristic level, we expect generically that ${f_1^{k+1}(n) \approx \frac{3}{2} f_1^k(n)}$ half the time, and ${f_1^{k+1}(n) \approx \frac{1}{2} f_1^k(n)}$ the other half of the time. The logarithm ${\log f_1^k(n)}$ of this orbit can then be modeled heuristically by a random walk with steps ${\log \frac{3}{2}}$ and ${\log \frac{1}{2}}$ occuring with equal probability. The expectation

$\displaystyle \frac{1}{2} \log \frac{3}{2} + \frac{1}{2} \log \frac{1}{2} = \frac{1}{2} \log \frac{3}{4}$

is negative, and so (by the classic gambler’s ruin) we expect the orbit to decrease over the long term. This can be viewed as heuristic justification of the Collatz conjecture, at least in the “average case” scenario in which ${n}$ is chosen uniform at random (e.g. in some large interval ${\{1,\ldots,N\}}$). (It also suggests that if one modifies the problem, e.g. by replacing ${3n+1}$ to ${5n+1}$, then one can obtain orbits that tend to increase over time, and indeed numerically for this variant one sees orbits that appear to escape to infinity.) Unfortunately, one can only rigorously keep the orbit uniformly distributed modulo ${2}$ for time about ${O(\log N)}$ or so; after that, the system is too complicated for naive methods to control at anything other than a heuristic level.

Remark 1 One can obtain a rigorous analogue of the above arguments by extending ${f_1}$ from the integers ${{\bf Z}}$ to the ${2}$-adics ${{\bf Z}_2}$. This compact abelian group comes with a Haar probability measure, and one can verify that this measure is invariant with respect to ${f_1}$; with a bit more effort one can verify that it is ergodic. This suggests the introduction of ergodic theory methods. For instance, using the pointwise ergodic theorem, we see that if ${n}$ is a random ${2}$-adic integer, then almost surely the orbit ${n, f_1(n), f_1^2(n), \ldots}$ will be even half the time and odd half the time asymptotically, thus supporting the above heuristics. Unfortunately, this does not directly tell us much about the dynamics on ${{\bf Z}}$, as this is a measure zero subset of ${{\bf Z}_2}$. More generally, unless a dynamical system is somehow “polynomial”, “nilpotent”, or “unipotent” in nature, the current state of ergodic theory is usually only able to say something meaningful about generic orbits, but not about all orbits. For instance, the very simple system ${x \rightarrow 10x}$ on the unit circle ${{\bf R}/{\bf Z}}$ is well understood from ergodic theory (in particular, almost all orbits will be uniformly distributed), but the orbit of a specific point, e.g. ${\pi\hbox{ mod } 1}$, is still nearly impossible to understand (this particular problem being equivalent to the notorious unsolved question of whether the digits of ${\pi}$ are uniformly distributed).

The above heuristic argument only suggests decreasing orbits for almost all ${n}$ (though even this remains unproven, the state of the art is that the number of ${n}$ in ${\{1,\ldots,N\}}$ that eventually go to ${1}$ is ${\gg N^{0.84}}$, a result of Krasikov and Lagarias). It leaves open the possibility of some very rare exceptional ${n}$ for which the orbit goes to infinity, or gets trapped in a periodic loop. Since the only loop that ${1}$ lies in is ${1,4,2}$ (for ${f_0}$) or ${1,2}$ (for ${f_1}$), we thus may isolate a weaker consequence of the Collatz conjecture:

Conjecture 2 (Weak Collatz conjecture) Suppose that ${n}$ is a natural number such that ${f^k_0(n)=n}$ for some ${k \geq 1}$. Then ${n}$ is equal to ${1}$, ${2}$, or ${4}$.

Of course, we may replace ${f_0}$ with ${f_1}$ (and delete “${4}$“) and obtain an equivalent conjecture.

This weaker version of the Collatz conjecture is also unproven. However, it was observed by Bohm and Sontacchi that this weak conjecture is equivalent to a divisibility problem involving powers of ${2}$ and ${3}$:

Conjecture 3 (Reformulated weak Collatz conjecture) There does not exist ${k \geq 1}$ and integers

$\displaystyle 0 = a_1 < a_2 < \ldots < a_{k+1}$

such that ${2^{a_{k+1}}-3^k}$ is a positive integer that is a proper divisor of

$\displaystyle 3^{k-1} 2^{a_1} + 3^{k-2} 2^{a_2} + \ldots + 2^{a_k},$

i.e.

$\displaystyle (2^{a_{k+1}} - 3^k) n = 3^{k-1} 2^{a_1} + 3^{k-2} 2^{a_2} + \ldots + 2^{a_k} \ \ \ \ \ (1)$

for some natural number ${n > 1}$.

Proposition 4 Conjecture 2 and Conjecture 3 are equivalent.

Proof: To see this, it is convenient to reformulate Conjecture 2 slightly. Define an equivalence relation ${\sim}$ on ${{\bf N}}$ by declaring ${a \sim b}$ if ${a/b = 2^m}$ for some integer ${m}$, thus giving rise to the quotient space ${{\bf N}/\sim}$ of equivalence classes ${[n]}$ (which can be placed, if one wishes, in one-to-one correspondence with the odd natural numbers). We can then define a function ${f_2: {\bf N}/\sim \rightarrow {\bf N}/\sim}$ by declaring

$\displaystyle f_2( [n] ) := [3n + 2^a] \ \ \ \ \ (2)$

for any ${n \in {\bf N}}$, where ${2^a}$ is the largest power of ${2}$ that divides ${n}$. It is easy to see that ${f_2}$ is well-defined (it is essentially the Syracuse function, after identifying ${{\bf N}/\sim}$ with the odd natural numbers), and that periodic orbits of ${f_2}$ correspond to periodic orbits of ${f_1}$ or ${f_0}$. Thus, Conjecture 2 is equivalent to the conjecture that ${[1]}$ is the only periodic orbit of ${f_2}$.

Now suppose that Conjecture 2 failed, thus there exists ${[n] \neq [1]}$ such that ${f_2^k([n])=[n]}$ for some ${k \geq 1}$. Without loss of generality we may take ${n}$ to be odd, then ${n>1}$. It is easy to see that ${[1]}$ is the only fixed point of ${f_2}$, and so ${k>1}$. An easy induction using (2) shows that

$\displaystyle f_2^k([n]) = [3^k n + 3^{k-1} 2^{a_1} + 3^{k-2} 2^{a_2} + \ldots + 2^{a_k}]$

where, for each ${1 \leq i \leq k}$, ${2^{a_i}}$ is the largest power of ${2}$ that divides

$\displaystyle n_i := 3^{i-1} n + 3^{i-2} 2^{a_1} + \ldots + 2^{a_{i-1}}. \ \ \ \ \ (3)$

In particular, as ${n_1 = n}$ is odd, ${a_1=0}$. Using the recursion

$\displaystyle n_{i+1} = 3n_i + 2^{a_i}, \ \ \ \ \ (4)$

we see from induction that ${2^{a_i+1}}$ divides ${n_{i+1}}$, and thus ${a_{i+1}>a_i}$:

$\displaystyle 0 = a_1 < a_2 < \ldots < a_k.$

Since ${f_2^k([n]) = [n]}$, we have

$\displaystyle 2^{a_{k+1}} n = 3^k n + 3^{k-1} 2^{a_1} + 3^{k-2} 2^{a_2} + \ldots + 2^{a_k} = 3 n_k + 2^{a_k}$

for some integer ${a_{k+1}}$. Since ${3 n_k + 2^{a_k}}$ is divisible by ${2^{a_k+1}}$, and ${n}$ is odd, we conclude ${a_{k+1} > a_k}$; if we rearrange the above equation as (1), then we obtain a counterexample to Conjecture 3.

Conversely, suppose that Conjecture 3 failed. Then we have ${k \geq 1}$, integers

$\displaystyle 0 = a_1 < a_2 < \ldots < a_{k+1}$

and a natural number ${n > 1}$ such that (1) holds. As ${a_1=0}$, we see that the right-hand side of (1) is odd, so ${n}$ is odd also. If we then introduce the natural numbers ${n_i}$ by the formula (3), then an easy induction using (4) shows that

$\displaystyle (2^{a_{k+1}} - 3^k) n_i = 3^{k-1} 2^{a_i} + 3^{k-2} 2^{a_{i+1}} + \ldots + 2^{a_{i+k-1}} \ \ \ \ \ (5)$

with the periodic convention ${a_{k+j} := a_j + a_{k+1}}$ for ${j>1}$. As the ${a_i}$ are increasing in ${i}$ (even for ${i \geq k+1}$), we see that ${2^{a_i}}$ is the largest power of ${2}$ that divides the right-hand side of (5); as ${2^{a_{k+1}}-3^k}$ is odd, we conclude that ${2^{a_i}}$ is also the largest power of ${2}$ that divides ${n_i}$. We conclude that

$\displaystyle f_2([n_i]) = [3n_i + 2^{a_i}] = [n_{i+1}]$

and thus ${[n]}$ is a periodic orbit of ${f_2}$. Since ${n}$ is an odd number larger than ${1}$, this contradicts Conjecture 3. $\Box$

Call a counterexample a tuple ${(k,a_1,\ldots,a_{k+1})}$ that contradicts Conjecture 3, i.e. an integer ${k \geq 1}$ and an increasing set of integers

$\displaystyle 0 = a_1 < a_2 < \ldots < a_{k+1}$

such that (1) holds for some ${n \geq 1}$. We record a simple bound on such counterexamples, due to Terras and to Garner :

Lemma 5 (Exponent bounds) Let ${N \geq 1}$, and suppose that the Collatz conjecture is true for all ${n < N}$. Let ${(k,a_1,\ldots,a_{k+1})}$ be a counterexample. Then

$\displaystyle \frac{\log 3}{\log 2} k < a_{k+1} < \frac{\log(3+\frac{1}{N})}{\log 2} k.$

Proof: The first bound is immediate from the positivity of ${2^{a_{k+1}}-3^k}$. To prove the second bound, observe from the proof of Proposition 4 that the counterexample ${(k,a_1,\ldots,a_{k+1})}$ will generate a counterexample to Conjecture 2, i.e. a non-trivial periodic orbit ${n, f(n), \ldots, f^K(n) = n}$. As the conjecture is true for all ${n < N}$, all terms in this orbit must be at least ${N}$. An inspection of the proof of Proposition 4 reveals that this orbit consists of ${k}$ steps of the form ${x \mapsto 3x+1}$, and ${a_{k+1}}$ steps of the form ${x \mapsto x/2}$. As all terms are at least ${N}$, the former steps can increase magnitude by a multiplicative factor of at most ${3+\frac{1}{N}}$. As the orbit returns to where it started, we conclude that

$\displaystyle 1 \leq (3+\frac{1}{N})^k (\frac{1}{2})^{a_{k+1}}$

whence the claim. $\Box$

The Collatz conjecture has already been verified for many values of ${n}$ (up to at least ${N = 5 \times 10^{18}}$, according to this web site). Inserting this into the above lemma, one can get lower bounds on ${k}$. For instance, by methods such as this, it is known that any non-trivial periodic orbit has length at least ${105{,}000}$, as shown in Garner’s paper (and this bound, which uses the much smaller value ${N = 2 \times 10^9}$ that was available in 1981, can surely be improved using the most recent computational bounds).

Now we can perform a heuristic count on the number of counterexamples. If we fix ${k}$ and ${a := a_{k+1}}$, then ${2^a > 3^k}$, and from basic combinatorics we see that there are ${\binom{a-1}{k-1}}$ different ways to choose the remaining integers

$\displaystyle 0 = a_1 < a_2 < \ldots < a_{k+1}$

to form a potential counterexample ${(k,a_1,\ldots,a_{k+1})}$. As a crude heuristic, one expects that for a “random” such choice of integers, the expression (1) has a probability ${1/q}$ of holding for some integer ${n}$. (Note that ${q}$ is not divisible by ${2}$ or ${3}$, and so one does not expect the special structure of the right-hand side of (1) with respect to those moduli to be relevant. There will be some choices of ${a_1,\ldots,a_k}$ where the right-hand side in (1) is too small to be divisible by ${q}$, but using the estimates in Lemma 5, one expects this to occur very infrequently.) Thus, the total expected number of solutions for this choice of ${a, k}$ is

$\displaystyle \frac{1}{q} \binom{a-1}{k-1}.$

The heuristic number of solutions overall is then expected to be

$\displaystyle \sum_{a,k} \frac{1}{q} \binom{a-1}{k-1}, \ \ \ \ \ (6)$

where, in view of Lemma 5, one should restrict the double summation to the heuristic regime ${a \approx \frac{\log 3}{\log 2} k}$, with the approximation here accurate to many decimal places.

We need a lower bound on ${q}$. Here, we will use Baker’s theorem (as discussed in this previous post), which among other things gives the lower bound

$\displaystyle q = 2^a - 3^k \gg 2^a / a^C \ \ \ \ \ (7)$

for some absolute constant ${C}$. Meanwhile, Stirling’s formula (as discussed in this previous post) combined with the approximation ${k \approx \frac{\log 2}{\log 3} a}$ gives

$\displaystyle \binom{a-1}{k-1} \approx \exp(h(\frac{\log 2}{\log 3}))^a$

where ${h}$ is the entropy function

$\displaystyle h(x) := - x \log x - (1-x) \log (1-x).$

A brief computation shows that

$\displaystyle \exp(h(\frac{\log 2}{\log 3})) \approx 1.9318 \ldots$

and so (ignoring all subexponential terms)

$\displaystyle \frac{1}{q} \binom{a-1}{k-1} \approx (0.9659\ldots)^a$

which makes the series (6) convergent. (Actually, one does not need the full strength of Lemma 5 here; anything that kept ${k}$ well away from ${a/2}$ would suffice. In particular, one does not need an enormous value of ${N}$; even ${N=5}$ (say) would be more than sufficient to obtain the heuristic that there are finitely many counterexamples.) Heuristically applying the Borel-Cantelli lemma, we thus expect that there are only a finite number of counterexamples to the weak Collatz conjecture (and inserting a bound such as ${k \geq 105{,}000}$, one in fact expects it to be extremely likely that there are no counterexamples at all).

This, of course, is far short of any rigorous proof of Conjecture 2. In order to make rigorous progress on this conjecture, it seems that one would need to somehow exploit the structural properties of numbers of the form

$\displaystyle 3^{k-1} 2^{a_1} + 3^{k-2} 2^{a_2} + \ldots + 2^{a_k}. \ \ \ \ \ (8)$

In some very special cases, this can be done. For instance, suppose that one had ${a_{i+1}=a_i+1}$ with at most one exception (this is essentially what is called a ${1}$-cycle by Steiner). Then (8) simplifies via the geometric series formula to a combination of just a bounded number of powers of ${2}$ and ${3}$, rather than an unbounded number. In that case, one can start using tools from transcendence theory such as Baker’s theorem to obtain good results; for instance, in the above-referenced paper of Steiner, it was shown that ${1}$-cycles cannot actually occur, and similar methods have been used to show that ${m}$-cycles (in which there are at most ${m}$ exceptions to ${a_{i+1}=a_i+1}$) do not occur for any ${m \leq 63}$, as was shown by Simons and de Weger. However, for general increasing tuples of integers ${a_1,\ldots,a_k}$, there is no such representation by bounded numbers of powers, and it does not seem that methods from transcendence theory will be sufficient to control the expressions (8) to the extent that one can understand their divisibility properties by quantities such as ${2^a-3^k}$.

Amusingly, there is a slight connection to Littlewood-Offord theory in additive combinatorics – the study of the ${2^n}$ random sums

$\displaystyle \pm v_1 \pm v_2 \pm \ldots \pm v_n$

generated by some elements ${v_1,\ldots,v_n}$ of an additive group ${G}$, or equivalently, the vertices of an ${n}$-dimensional parallelepiped inside ${G}$. Here, the relevant group is ${{\bf Z}/q{\bf Z}}$. The point is that if one fixes ${k}$ and ${a_{k+1}}$ (and hence ${q}$), and lets ${a_1,\ldots,a_k}$ vary inside the simplex

$\displaystyle \Delta := \{ (a_1,\ldots,a_k) \in {\bf N}^k: 0 = a_1 < \ldots < a_k < a_{k+1}\}$

then the set ${S}$ of all sums of the form (8) (viewed as an element of ${{\bf Z}/q{\bf Z}}$) contains many large parallelepipeds. (Note, incidentally, that once one fixes ${k}$, all the sums of the form (8) are distinct; because given (8) and ${k}$, one can read off ${2^{a_1}}$ as the largest power of ${2}$ that divides (8), and then subtracting off ${3^{k-1} 2^{a_1}}$ one can then read off ${2^{a_2}}$, and so forth.) This is because the simplex ${\Delta}$ contains many large cubes. Indeed, if one picks a typical element ${(a_1,\ldots,a_k)}$ of ${\Delta}$, then one expects (thanks to Lemma 5) that there there will be ${\gg k}$ indices ${1 \leq i_1 < \ldots < i_m \leq k}$ such that ${a_{i_j+1} > a_{i_j}+1}$ for ${j=1,\ldots,m}$, which allows one to adjust each of the ${a_{i_j}}$ independently by ${1}$ if desired and still remain inside ${\Delta}$. This gives a cube in ${\Delta}$ of dimension ${\gg k}$, which then induces a parallelepiped of the same dimension in ${S}$. A short computation shows that the generators of this parallelepiped consist of products of a power of ${2}$ and a power of ${3}$, and in particular will be coprime to ${q}$.

If the weak Collatz conjecture is true, then the set ${S}$ must avoid the residue class ${0}$ in ${{\bf Z}/q{\bf Z}}$. Let us suppose temporarily that we did not know about Baker’s theorem (and the associated bound (7)), so that ${q}$ could potentially be quite small. Then we would have a large parallelepiped inside a small cyclic group ${{\bf Z}/q{\bf Z}}$ that did not cover all of ${{\bf Z}/q{\bf Z}}$, which would not be possible for ${q}$ small enough. Indeed, an easy induction shows that a ${d}$-dimensional parallelepiped in ${{\bf Z}/q{\bf Z}}$, with all generators coprime to ${q}$, has cardinality at least ${\min(q, d+1)}$. This argument already shows the lower bound ${q \gg k}$. In other words, we have

Proposition 6 Suppose the weak Collatz conjecture is true. Then for any natural numbers ${a, k}$ with ${2^a > 3^k}$, one has ${2^a-3^k \gg k}$.

This bound is very weak when compared against the unconditional bound (7). However, I know of no way to get a nontrivial separation property between powers of ${2}$ and powers of ${3}$ other than via transcendence theory methods. Thus, this result strongly suggests that any proof of the Collatz conjecture must either use existing results in transcendence theory, or else must contribute a new method to give non-trivial results in transcendence theory. (This already rules out a lot of possible approaches to solve the Collatz conjecture.)

By using more sophisticated tools in additive combinatorics, one can improve the above proposition (though it is still well short of the transcendence theory bound (7)):

Proposition 7 Suppose the weak Collatz conjecture is true. Then for any natural numbers ${a, k}$ with ${2^a > 3^k}$, one has ${2^a-3^k \gg (1+\epsilon)^k}$ for some absolute constant ${\epsilon > 0}$.

Proof: (Informal sketch only) Suppose not, then we can find ${a, k}$ with ${q := 2^a-3^k}$ of size ${(1+o(1))^k = \exp(o(k))}$. We form the set ${S}$ as before, which contains parallelepipeds in ${{\bf Z}/q{\bf Z}}$ of large dimension ${d \gg k}$ that avoid ${0}$. We can count the number of times ${0}$ occurs in one of these parallelepipeds by a standard Fourier-analytic computation involving Riesz products (see Chapter 7 of my book with Van Vu, or this recent preprint of Maples). Using this Fourier representation, the fact that this parallelepiped avoids ${0}$ (and the fact that ${q = \exp(o(k)) = \exp(o(d))}$) forces the generators ${v_1,\ldots,v_d}$ to be concentrated in a Bohr set, in that one can find a non-zero frequency ${\xi \in {\bf Z}/q{\bf Z}}$ such that ${(1-o(1))d}$ of the ${d}$ generators lie in the set ${\{ v: \xi v = o(q) \hbox{ mod } q \}}$. However, one can choose the generators to essentially have the structure of a (generalised) geometric progression (up to scaling, it resembles something like ${2^i 3^{\lfloor \alpha i \rfloor}}$ for ${i}$ ranging over a generalised arithmetic progression, and ${\alpha}$ a fixed irrational), and one can show that such progressions cannot be concentrated in Bohr sets (this is similar in spirit to the exponential sum estimates of Bourgain on approximate multiplicative subgroups of ${{\bf Z}/q{\bf Z}}$, though one can use more elementary methods here due to the very strong nature of the Bohr set concentration (being of the “${99\%}$ concentration” variety rather than the “${1\%}$ concentration”).). This furnishes the required contradiction. $\Box$

Thus we see that any proposed proof of the Collatz conjecture must either use transcendence theory, or introduce new techniques that are powerful enough to create exponential separation between powers of ${2}$ and powers of ${3}$.

Unfortunately, once one uses the transcendence theory bound (7), the size ${q}$ of the cyclic group ${{\bf Z}/q{\bf Z}}$ becomes larger than the volume of any cube in ${S}$, and Littlewood-Offord techniques are no longer of much use (they can be used to show that ${S}$ is highly equidistributed in ${{\bf Z}/q{\bf Z}}$, but this does not directly give any way to prevent ${S}$ from containing ${0}$).

One possible toy model problem for the (weak) Collatz conjecture is a conjecture of Erdos asserting that for ${n>8}$, the base ${3}$ representation of ${2^n}$ contains at least one ${2}$. (See this paper of Lagarias for some work on this conjecture and on related problems.) To put it another way, the conjecture asserts that there are no integer solutions to

$\displaystyle 2^n = 3^{a_1} + 3^{a_2} + \ldots + 3^{a_k}$

with ${n > 8}$ and ${0 \leq a_1 < \ldots < a_k}$. (When ${n=8}$, of course, one has ${2^8 = 3^0 + 3^1 + 3^2 + 3^5}$.) In this form we see a resemblance to Conjecture 3, but it looks like a simpler problem to attack (though one which is still a fair distance beyond what one can do with current technology). Note that one has a similar heuristic support for this conjecture as one does for Proposition 3; a number of magnitude ${2^n}$ has about ${n \frac{\log 2}{\log 3}}$ base ${3}$ digits, so the heuristic probability that none of these digits are equal to ${2}$ is ${3^{-n\frac{\log 2}{\log 3}} = 2^{-n}}$, which is absolutely summable.

Now we turn attention to another important spectral statistic, the least singular value ${\sigma_n(M)}$ of an ${n \times n}$ matrix ${M}$ (or, more generally, the least non-trivial singular value ${\sigma_p(M)}$ of a ${n \times p}$ matrix with ${p \leq n}$). This quantity controls the invertibility of ${M}$. Indeed, ${M}$ is invertible precisely when ${\sigma_n(M)}$ is non-zero, and the operator norm ${\|M^{-1}\|_{op}}$ of ${M^{-1}}$ is given by ${1/\sigma_n(M)}$. This quantity is also related to the condition number ${\sigma_1(M)/\sigma_n(M) = \|M\|_{op} \|M^{-1}\|_{op}}$ of ${M}$, which is of importance in numerical linear algebra. As we shall see in the next set of notes, the least singular value of ${M}$ (and more generally, of the shifts ${\frac{1}{\sqrt{n}} M - zI}$ for complex ${z}$) will be of importance in rigorously establishing the circular law for iid random matrices ${M}$, as it plays a key role in computing the Stieltjes transform ${\frac{1}{n} \hbox{tr} (\frac{1}{\sqrt{n}} M - zI)^{-1}}$ of such matrices, which as we have already seen is a powerful tool in understanding the spectra of random matrices.

The least singular value

$\displaystyle \sigma_n(M) = \inf_{\|x\|=1} \|Mx\|,$

which sits at the “hard edge” of the spectrum, bears a superficial similarity to the operator norm

$\displaystyle \|M\|_{op} = \sigma_1(M) = \sup_{\|x\|=1} \|Mx\|$

at the “soft edge” of the spectrum, that was discussed back in Notes 3, so one may at first think that the methods that were effective in controlling the latter, namely the epsilon-net argument and the moment method, would also work to control the former. The epsilon-net method does indeed have some effectiveness when dealing with rectangular matrices (in which the spectrum stays well away from zero), but the situation becomes more delicate for square matrices; it can control some “low entropy” portions of the infimum that arise from “structured” or “compressible” choices of ${x}$, but are not able to control the “generic” or “incompressible” choices of ${x}$, for which new arguments will be needed. As for the moment method, this can give the coarse order of magnitude (for instance, for rectangular matrices with ${p=yn}$ for ${0 < y < 1}$, it gives an upper bound of ${(1-\sqrt{y}+o(1))n}$ for the singular value with high probability, thanks to the Marchenko-Pastur law), but again this method begins to break down for square matrices, although one can make some partial headway by considering negative moments such as ${\hbox{tr} M^{-2}}$, though these are more difficult to compute than positive moments ${\hbox{tr} M^k}$.

So one needs to supplement these existing methods with additional tools. It turns out that the key issue is to understand the distance between one of the ${n}$ rows ${X_1,\ldots,X_n \in {\bf C}^n}$ of the matrix ${M}$, and the hyperplane spanned by the other ${n-1}$ rows. The reason for this is as follows. First suppose that ${\sigma_n(M)=0}$, so that ${M}$ is non-invertible, and there is a linear dependence between the rows ${X_1,\ldots,X_n}$. Thus, one of the ${X_i}$ will lie in the hyperplane spanned by the other rows, and so one of the distances mentioned above will vanish; in fact, one expects many of the ${n}$ distances to vanish. Conversely, whenever one of these distances vanishes, one has a linear dependence, and so ${\sigma_n(M)=0}$.

More generally, if the least singular value ${\sigma_n(M)}$ is small, one generically expects many of these ${n}$ distances to be small also, and conversely. Thus, control of the least singular value is morally equivalent to control of the distance between a row ${X_i}$ and the hyperplane spanned by the other rows. This latter quantity is basically the dot product of ${X_i}$ with a unit normal ${n_i}$ of this hyperplane.

When working with random matrices with jointly independent coefficients, we have the crucial property that the unit normal ${n_i}$ (which depends on all the rows other than ${X_i}$) is independent of ${X_i}$, so even after conditioning ${n_i}$ to be fixed, the entries of ${X_i}$ remain independent. As such, the dot product ${X_i \cdot n_i}$ is a familiar scalar random walk, and can be controlled by a number of tools, most notably Littlewood-Offord theorems and the Berry-Esséen central limit theorem. As it turns out, this type of control works well except in some rare cases in which the normal ${n_i}$ is “compressible” or otherwise highly structured; but epsilon-net arguments can be used to dispose of these cases. (This general strategy was first developed for the technically simpler singularity problem by Komlós, and then extended to the least singular value problem by Rudelson.)

These methods rely quite strongly on the joint independence on all the entries; it remains a challenge to extend them to more general settings. Even for Wigner matrices, the methods run into difficulty because of the non-independence of some of the entries (although it turns out one can understand the least singular value in such cases by rather different methods).

To simplify the exposition, we shall focus primarily on just one specific ensemble of random matrices, the Bernoulli ensemble ${M = (\xi_{ij})_{1 \leq i,j \leq n}}$ of random sign matrices, where ${\xi_{ij} = \pm 1}$ are independent Bernoulli signs. However, the results can extend to more general classes of random matrices, with the main requirement being that the coefficients are jointly independent.

Van Vu and I have just uploaded to the arXiv our joint paper “The Littlewood-Offord problem in high dimensions and a conjecture of Frankl and Füredi“. In this short paper we give a different proof of a high-dimensional Littlewood-Offord result of Frankl and Füredi, and in the process also affirmatively answer one of their open problems.

Let ${v_1,\ldots,v_n}$ be ${n}$ vectors in ${{\mathbb R}^d}$, which we normalise to all have length at least ${1}$. For any given radius ${\Delta > 0}$, we consider the small ball probability

$\displaystyle p(v_1,\ldots,v_n,\Delta) := \sup_B {\bf P}( \eta_1 v_1 + \ldots + \eta_n v_n \in B )$

where ${\eta_1,\ldots,\eta_n}$ are iid Bernoulli signs (i.e. they take values ${+1}$ or ${-1}$ independently with a probability of ${1/2}$ of each), and ${B}$ ranges over all (closed) balls of radius ${\Delta}$. The Littlewood-Offord problem is to compute the quantity

$\displaystyle p_d(n,\Delta) := \sup_{v_1,\ldots,v_n} p(v_1,\ldots,v_n,\Delta)$

where ${v_1,\ldots,v_n}$ range over all vectors in ${{\mathbb R}^d}$ of length at least one. Informally, this number measures the extent to which a random walk of length ${n}$ (with all steps of size at least one) can concentrate into a ball of radius ${\Delta}$.

The one-dimensional case of this problem was answered by Erdös. First, one observes that one can normalise all the ${v_i}$ to be at least ${+1}$ (as opposed to being at most ${-1}$). In the model case when ${\Delta < 1}$, he made the following simple observation: if a random sum ${\eta_1 v_1 + \ldots + \eta_n v_n}$ fell into a ball of radius ${\Delta}$ (which in the one-dimensional case, is an interval of length less than ${2}$), and one then changed one or more of the signs ${\eta_i}$ from ${-1}$ to ${+1}$, then the new sum must necessarily lie outside of the ball. In other words, for any ball ${B}$ of radius ${\Delta}$, the set of signs ${(\eta_1,\ldots,\eta_n) \in \{-1,+1\}^n}$ for which ${\eta_1 v_1 + \ldots + \eta_n v_n \in B}$ forms an antichain. Applying Sperner’s theorem, the maximal size of this antichain is ${\binom{n}{\lfloor n/2\rfloor}}$, and this soon leads to the exact value

$\displaystyle p_1(n,\Delta) = \binom{n}{\lfloor n/2\rfloor}/2^n = \frac{\sqrt{\frac{2}{\pi}}+o(1)}{\sqrt{n}}$

when ${0 \leq \Delta < 1}$ (the bound is attained in the extreme case ${v_1=\ldots=v_n=1}$).

A similar argument works for higher values of ${\Delta}$, using Dilworth’s theorem instead of Sperner’s theorem, and gives the exact value

$\displaystyle p_1(n,\Delta) = \sum_{j=1}^s \binom{n}{m_j}/2^n = \frac{s\sqrt{\frac{2}{\pi}}+o(1)}{\sqrt{n}}$

whenever ${n \geq s}$ and ${s-1 \leq \Delta < s}$ for some natural number ${s}$, where ${\binom{n}{m_1},\ldots,\binom{n}{m_s}}$ are the ${s}$ largest binomial coefficients of ${\binom{n}{1}, \ldots, \binom{n}{n}}$.

Now consider the higher-dimensional problem. One has the obvious bound

$\displaystyle p_d(n,\Delta) \geq p_1(n,\Delta),$

but it is not obvious whether this inequality is strict. In other words, is there some way to exploit the additional freedom given by higher dimensions to make random walks concentrate more than in the one-dimensional case?

For some values of ${\Delta}$, it turns out that the answer is no, as was first observed by Kleitman (and discussed further by Frankl and Füredi). Suppose for instance that

$\displaystyle \sqrt{(s-1)^2+1} \leq \Delta < s$

for some ${s \geq 2}$. Then one can consider the example in which ${v_1=\ldots=v_{n-1}=e_1}$ is one unit vector, and ${v_n=e_2}$ is another unit vector orthogonal to ${e_1}$. The small ball probability in this case can be computed to equal ${p_1(n-1,s-1)}$ rather than ${p_1(n,s-1)}$, which is slightly larger.

In the positive direction, Frankl and Füredi established the asymptotic

$\displaystyle p_d(n,\Delta) = (1 + o(1)) p_1(n,\Delta) \ \ \ \ \ (1)$

as ${n \rightarrow \infty}$ (holding ${d}$ and ${\Delta}$ fixed). Furthermore, if ${\Delta}$ was close to an integer, and more precisely if

$\displaystyle s-1 \leq \Delta < s-1 + \frac{1}{10s^2}$

(so that the above counterexample can be avoided) they showed that ${p_d(n,\Delta) = p_1(n,\Delta)}$ for sufficiently large ${n}$ (depending on ${s,\Delta}$).

The factor ${\frac{1}{10s^2}}$ was an artefact of their method, and they conjectured in fact that one should have ${p_d(n,\Delta) = p_1(n,\Delta)}$ for sufficiently large ${n}$ whenever

$\displaystyle s-1 \leq \Delta < \sqrt{(s-1)^2+1} \ \ \ \ \ (2)$

thus matching the counterexample exactly. This conjecture was verified for ${s = 1}$ by Kleitman and for ${s=2,3}$ by Frankl and Füredi.

In this paper we verify the conjecture of Frankl and Füredi (and give a new proof of their asymptotic (1)). Our main tool is the following high-dimensional Littlewood-Offord inequality:

Theorem 1 Suppose that ${v_1,\ldots,v_n \in {\mathbb R}^d}$ which is genuinely ${d}$-dimensional in the sense that for any hyperplane ${H}$ going through the origin, one has ${\hbox{dist}(v_i,H) \geq 1}$ for at least ${k}$ values of ${i}$. Then one has

$\displaystyle p(v_1,\ldots,v_n,\Delta) \ll_{d,\Delta} k^{-d/2}.$

Theorem 1 can be viewed as a high-dimensional variant of Erdös’s inequality (but without the sharp upper bound). It is proven by the Fourier-analytic method of Halász. (This theorem was announced in my book with Van Vu several years ago, but we did not get around to publishing it until now.)

Using Theorem 1, one can verify the conjecture of Frankl and Füredi fairly quickly (the deduction takes a little over a page). The main point is that if there is excessive concentration, then Theorem 1 quickly places almost all of the vectors ${v_1,\ldots,v_n}$ to lie very close to a line. If all the vectors are close to a line, then we can project onto this line and rescale, which causes ${\Delta}$ to worsen a little bit in this reduction to the one-dimensional case, but it turns out that the bounds (2) allow us to tolerate this degradation of ${\Delta}$ once ${s>3}$ (so it is fortunate that the cases ${s \leq 3}$ were already done for us!). If instead we have a vector far from the line (as is the case in the key counterexample), then we manually eliminate that vector using the parallelogram law, which effectively drops ${\Delta}$ below ${s-1}$ (half of the time, at least) if ${\Delta}$ was initially less than ${\sqrt{(s-1)^2+1}}$, which gives enough of a saving to conclude the argument.

One moral that one can draw from this argument is that one can use a quasi-sharp estimate (such as Theorem 1), which ostensibly loses constant factors, to then deduce a sharp estimate (such as the Frankl-Furëdi conjecture) that loses no constant factors, as long as one is in an asymptotic regime (in this case, ${s \geq 3}$ and ${n}$ large depending on ${d,\Delta}$). The key is to exploit the fine structure in the main term (in this case, the piecewise constant nature of ${p_1(n,\Delta)}$ when ${\Delta}$ passes over integers) to extract gains that can absorb the losses coming from the quasi-sharp estimate).

Van Vu and I have just uploaded to the arXiv our preprint “On the permanent of random Bernoulli matrices“, submitted to Adv. Math. This paper establishes analogues of some recent results on the determinant of random $n \times n$ Bernoulli matrices (matrices in which all entries are either +1 or -1, with equal probability of each), in which the determinant is replaced by the permanent.

More precisely, let M be a random $n \times n$ Bernoulli matrix, with n large. Since every row of this matrix has magnitude $n^{1/2}$, it is easy to see (by interpreting the determinant as the signed volume of a parallelopiped) that $|\det(M)|$ is at most $n^{n/2}$, with equality being satisfied exactly when M is a Hadamard matrix. In fact, it is known that the determinant $\det(M)$ has magnitude $n^{(1/2 - o(1)) n}$ with probability $1-o(1)$; for a more precise result, see my earlier paper with Van. (There is in fact believed to be a central limit theorem for $\log |\det(M)|$; see this paper of Girko for details.) These results are based upon the elementary “base times height” formula for the volume of a parallelopiped; the main difficulty is to understand what the distance is from one row of M to a subspace spanned by several other rows of M.

The permanent $\hbox{per}(M)$ looks formally very similar to the determinant, but does not have a geometric interpretation as a signed volume of a parallelopiped and so can only be analysed combinatorially; the main difficulty is to understand the cancellation that can arise from the various signs in the matrix. It can be somewhat larger than the determinant; for instance, the maximum value of $\hbox{per}(M)$ for a Bernoulli matrix M is $n! = n^{(1 - o(1)) n}$, attaned when M consists entirely of +1′s. Nevertheless, it is not hard to see that $\hbox{per}(M)$ has the same mean and standard deviation as $\det(M)$, namely 0 and $\sqrt{n!}$ respectively, which shows that $|\hbox{per}(M)|$ is at most $n^{(1/2-o(1))n}$ with probability 1-o(1). Our main result is to show that one also has that $|\hbox{per}(M)|$ is at least $n^{(1/2-o(1))n}$ with probability 1-o(1), thus obtaining the analogue of the previously mentioned result for the determinant (though our o(1) bounds are significantly weaker).

In particular, this shows that the probability that the permanent vanishes completely is o(1) (in fact, we get a bound of $O(n^{-c})$ for some absolute constant $c > 0$). This result appears to be new (although there is a cute observation of Alon (see e.g. this paper of Wanless for a proof) that if $n=2^m-1$ is one less than a power of 2, then every Bernoulli matrix has non-zero permanent). In contrast, the probability that the determinant vanishes completely is conjectured to equal $(1/2 + o(1))^n$ (which is easily seen to be a lower bound), but the best known upper bound for this probability is $(1/\sqrt{2 }+ o(1))^n$, due to Bourgain, Vu, and Wood.

This is my second Milliman lecture, in which I talk about recent applications of ideas from additive combinatorics (and in particular, from the inverse Littlewood-Offord problem) to the theory of discrete random matrices.
Read the rest of this entry »

Van Vu and I have recently uploaded our joint paper, “Random matrices: the circular law“, submitted to Contributions to Discrete Mathematics. In this paper we come close to fully resolving the circular law conjecture regarding the eigenvalue distribution of random matrices, for arbitrary choices of coefficient distribution.

More precisely, suppose we have an $n \times n$ matrix $N_n$ for some large n, where each coefficient $x_{ij}$ of $N_n$ is an independent identically distributed copy of a single random variable x (possibly complex-valued). x could be continuous (e.g. a Gaussian) or discrete (e.g. a Bernoulli random variable, taking values +1 and -1 with equal probability). For simplicity, let us normalise x to have mean 0 and variance 1 (in particular, the second moment is finite). This matrix will not be self-adjoint or normal, but we still expect it to be diagonalisable, with n complex eigenvalues. Heuristic arguments suggest that these eigenvalues should mostly have magnitude $O(\sqrt{n})$; for instance, one can see this by observing that the Hilbert-Schmidt norm (a.k.a. the Frobenius norm) $\hbox{tr} N_n^* N_n$, which can be shown to dominate the sum of squares of the magnitudes of the eigenvalues, is of size comparable to $n^2$ on the average. Because of this, it is customary to normalise the matrix by $1/\sqrt{n}$; thus let $\lambda_1,\ldots,\lambda_n$ be the n complex eigenvalues of $\frac{1}{\sqrt{n}} N_n$, arranged in any order.

Numerical evidence (as seen for instance here) soon reveals that these n eigenvalues appear to distribute themselves uniformly in the unit circle $\{ z \in {\Bbb C}: |z| \leq 1 \}$ in the limit $n \to \infty$. This phenomenon is known as the circular law. It can be made more precise; if we define the empirical spectral distribution $\mu_n: {\Bbb R}^2 \to [0,1]$ to be the function

$\mu_n(s,t) := \frac{1}{n} \# \{ 1 \leq k \leq n: \hbox{Re}(\lambda_k) \leq s; \hbox{Im}(\lambda_k) \leq t \}$

then with probability 1, $\mu_n$ should converge uniformly to the uniform distribution $\mu_\infty$ of the unit circle, defined as

$\mu_\infty(s,t) := \frac{1}{\pi} \hbox{mes}(\{ (x,y): |x|^2 + |y|^2 \leq 1; x \leq s; y \leq t \}.$

This statement is known as the circular law conjecture. In the case when x is a complex Gaussian, this law was verified by Mehta (using an explicit formula of Ginibre for the joint density function of the eigenvalues in this case). A strategy for attacking the general case was then formulated by Girko, although a fully rigorous execution of that strategy was first achieved by Bai (and then improved slightly by Bai and Silverstein). They established the circular law under the assumption that x had slightly better than bounded second moment (i.e. ${\Bbb E}|x|^{2+\delta} < \infty$ for some $\delta > 0$), but more importantly that the probability density function of x in the complex plane was bounded (in particular, this ruled out all discrete random variables, such as the Bernoulli random variable). The reason for this latter restriction was in order to control the event that the matrix $N_n$ (or more precisely $\frac{1}{\sqrt{n}} N_n - z I$ for various complex numbers z) becomes too ill-conditioned by having a very small least singular value.

In the last few years, work of Rudelson, of myself with Van Vu, and of Rudelson-Vershynin (building upon earlier work of Kahn, Komlos, and Szemerédi, and of Van and myself), have opened the way to control the condition number of random matrices even when the matrices are discrete, and so there have been a recent string of results using these techniques to extend the circular law to discrete settings. In particular, Gotze and Tikhomirov established the circular law for discrete random variables which were sub-Gaussian, which was then relaxed by Pan and Zhou to an assumption of bounded fourth moment. In our paper, we get very close to the ideal assumption of bounded second moment; we need ${\Bbb E} |x|^2 \log^C(1+|x|) < \infty$ for some $C > 16$. (The power of 16 in the logarithm can certainly be improved, though our methods do not allow the logarithm to be removed entirely.)

The main new difficulty that arises when relaxing the moment condition so close to the optimal one is that one begins to lose control on the largest singular value of $\frac{1}{\sqrt{n}} N_n$, i.e. on the operator norm of $\frac{1}{\sqrt{n}} N_n$. Under high moment assumptions (e.g. fourth moment) one can keep this operator norm bounded with reasonable probability (especially after truncating away some exceptionally large elements), but when the moment conditions are loosened, one can only bound this operator norm by a quantity bounded polynomially in n, even after truncation. This in turn causes certain metric entropy computations to become significantly more delicate, as one has to reduce the scale $\epsilon$ of the net below what one would ordinarily like to have.

I’ve arXived the paper “On the condition number of a randomly perturbed matrix”, authored with Van Vu, which will appear at STOC 2007. (This paper was actually finished and submitted several months ago, but for some reason I neglected to upload it to the arXiv until now.) Here we show that given an arbitrary $n \times n$ matrix M with polynomially bounded entries (in particular, M could be highly singular), a random discrete perturbation M+N, where N is (say) a random Bernoulli $\pm 1$ matrix, will have polynomially bounded condition number with high probability (specifically, given any A there exists a B such that the condition number is $O(n^B)$ with probability $1-O(n^{-A})$). This is the discrete analogue of an analogous result by Spielman and Teng for continuous random perturbations. The proof uses the inverse Littlewood-Offord technology as well as a discretisation lemma for generalized arithmetic progressions, both from our previous paper (which essentially addressed the M=0 case). It turns out that the effect of M is essentially just to add a few deterministic rows to a random matrix which one needs to obtain lower bounds for the least singular value.