You are currently browsing the tag archive for the ‘Abel prize’ tag.

A few days ago, Endre Szemerédi was awarded the 2012 Abel prize “for his fundamental contributions to discrete mathematics and theoretical computer science, and in recognition of the profound and lasting impact of these contributions on additive number theory and ergodic theory.” The full citation for the prize may be found here, and the written notes for a talk given by Tim Gowers on Endre’s work at the announcement may be found here (and video of the talk can be found here).

As I was on the Abel prize committee this year, I won’t comment further on the prize, but will instead focus on what is arguably Endre’s most well known result, namely Szemerédi’s theorem on arithmetic progressions:

Theorem 1 (Szemerédi’s theorem) Let ${A}$ be a set of integers of positive upper density, thus ${\lim \sup_{N \rightarrow\infty} \frac{|A \cap [-N,N]|}{|[-N,N]|} > 0}$, where ${[-N,N] := \{-N, -N+1,\ldots,N\}}$. Then ${A}$ contains an arithmetic progression of length ${k}$ for any ${k>1}$.

Szemerédi’s original proof of this theorem is a remarkably intricate piece of combinatorial reasoning. Most proofs of theorems in mathematics – even long and difficult ones – generally come with a reasonably compact “high-level” overview, in which the proof is (conceptually, at least) broken down into simpler pieces. There may well be technical difficulties in formulating and then proving each of the component pieces, and then in fitting the pieces together, but usually the “big picture” is reasonably clear. To give just one example, the overall strategy of Perelman’s proof of the Poincaré conjecture can be briefly summarised as follows: to show that a simply connected three-dimensional manifold is homeomorphic to a sphere, place a Riemannian metric on it and perform Ricci flow, excising any singularities that arise by surgery, until the entire manifold becomes extinct. By reversing the flow and analysing the surgeries performed, obtain enough control on the topology of the original manifold to establish that it is a topological sphere.

In contrast, the pieces of Szemerédi’s proof are highly interlocking, particularly with regard to all the epsilon-type parameters involved; it takes quite a bit of notational setup and foundational lemmas before the key steps of the proof can even be stated, let alone proved. Szemerédi’s original paper contains a logical diagram of the proof (reproduced in Gowers’ recent talk) which already gives a fair indication of this interlocking structure. (Many years ago I tried to present the proof, but I was unable to find much of a simplification, and my exposition is probably not that much clearer than the original text.) Even the use of nonstandard analysis, which is often helpful in cleaning up armies of epsilons, turns out to be a bit tricky to apply here. (In typical applications of nonstandard analysis, one can get by with a single nonstandard universe, constructed as an ultrapower of the standard universe; but to correctly model all the epsilons occuring in Szemerédi’s argument, one needs to repeatedly perform the ultrapower construction to obtain a (finite) sequence of increasingly nonstandard (and increasingly saturated) universes, each one containing unbounded quantities that are far larger than any quantity that appears in the preceding universe, as discussed at the end of this previous blog post. This sequence of universes does end up concealing all the epsilons, but it is not so clear that this is a net gain in clarity for the proof; I may return to the nonstandard presentation of Szemeredi’s argument at some future juncture.)

Instead of trying to describe the entire argument here, I thought I would instead show some key components of it, with only the slightest hint as to how to assemble the components together to form the whole proof. In particular, I would like to show how two particular ingredients in the proof – namely van der Waerden’s theorem and the Szemerédi regularity lemma – become useful. For reasons that will hopefully become clearer later, it is convenient not only to work with ordinary progressions ${P_1 = \{ a, a+r_1, a+2r_1, \ldots, a+(k_1-1)r_1\}}$, but also progressions of progressions ${P_2 := \{ P_1, P_1 + r_2, P_1+2r_2, \ldots, P_1+(k_2-1)r_2\}}$, progressions of progressions of progressions, and so forth. (In additive combinatorics, these objects are known as generalised arithmetic progressions of rank one, two, three, etc., and play a central role in the subject, although the way they are used in Szemerédi’s proof is somewhat different from the way that they are normally used in additive combinatorics.) Very roughly speaking, Szemerédi’s proof begins by building an enormous generalised arithmetic progression of high rank containing many elements of the set ${A}$ (arranged in a “near-maximal-density” configuration), and then steadily prunes this progression to improve the combinatorial properties of the configuration, until one ends up with a single rank one progression of length ${k}$ that consists entirely of elements of ${A}$.

To illustrate some of the basic ideas, let us first consider a situation in which we have located a progression ${P, P + r, \ldots, P+(k-1)r}$ of progressions of length ${k}$, with each progression ${P+ir}$, ${i=0,\ldots,k-1}$ being quite long, and containing a near-maximal amount of elements of ${A}$, thus

$\displaystyle |A \cap (P+ir)| \approx \delta |P|$

where ${\delta := \lim \sup_{|P| \rightarrow \infty} \frac{|A \cap P|}{|P|}}$ is the “maximal density” of ${A}$ along arithmetic progressions. (There are a lot of subtleties in the argument about exactly how good the error terms are in various approximations, but we will ignore these issues for the sake of this discussion and just use the imprecise symbols such as ${\approx}$ instead.) By hypothesis, ${\delta}$ is positive. The objective is then to locate a progression ${a, a+r', \ldots,a+(k-1)r'}$ in ${A}$, with each ${a+ir}$ in ${P+ir}$ for ${i=0,\ldots,k-1}$. It may help to view the progression of progressions ${P, P + r, \ldots, P+(k-1)r}$ as a tall thin rectangle ${P \times \{0,\ldots,k-1\}}$.

If we write ${A_i := \{ a \in P: a+ir \in A \}}$ for ${i=0,\ldots,k-1}$, then the problem is equivalent to finding a (possibly degenerate) arithmetic progression ${a_0,a_1,\ldots,a_{k-1}}$, with each ${a_i}$ in ${A_i}$.

By hypothesis, we know already that each set ${A_i}$ has density about ${\delta}$ in ${P}$:

$\displaystyle |A_i \cap P| \approx \delta |P|. \ \ \ \ \ (1)$

Let us now make a “weakly mixing” assumption on the ${A_i}$, which roughly speaking asserts that

$\displaystyle |A_i \cap E| \approx \delta \sigma |P| \ \ \ \ \ (2)$

for “most” subsets ${E}$ of ${P}$ of density ${\approx \sigma}$ of a certain form to be specified shortly. This is a plausible type of assumption if one believes ${A_i}$ to behave like a random set, and if the sets ${E}$ are constructed “independently” of the ${A_i}$ in some sense. Of course, we do not expect such an assumption to be valid all of the time, but we will postpone consideration of this point until later. Let us now see how this sort of weakly mixing hypothesis could help one count progressions ${a_0,\ldots,a_{k-1}}$ of the desired form.

We will inductively consider the following (nonrigorously defined) sequence of claims ${C(i,j)}$ for each ${0 \leq i \leq j < k}$:

• ${C(i,j)}$: For most choices of ${a_j \in P}$, there are ${\sim \delta^i |P|}$ arithmetic progressions ${a_0,\ldots,a_{k-1}}$ in ${P}$ with the specified choice of ${a_j}$, such that ${a_l \in A_l}$ for all ${l=0,\ldots,i-1}$.

(Actually, to avoid boundary issues one should restrict ${a_j}$ to lie in the middle third of ${P}$, rather than near the edges, but let us ignore this minor technical detail.) The quantity ${\delta^i |P|}$ is natural here, given that there are ${\sim |P|}$ arithmetic progressions ${a_0,\ldots,a_{k-1}}$ in ${P}$ that pass through ${a_i}$ in the ${i^{th}}$ position, and that each one ought to have a probability of ${\delta^i}$ or so that the events ${a_0 \in A_0, \ldots, a_{i-1} \in A_{i-1}}$ simultaneously hold.) If one has the claim ${C(k-1,k-1)}$, then by selecting a typical ${a_{k-1}}$ in ${A_{k-1}}$, we obtain a progression ${a_0,\ldots,a_{k-1}}$ with ${a_i \in A_i}$ for all ${i=0,\ldots,k-1}$, as required. (In fact, we obtain about ${\delta^k |P|^2}$ such progressions by this method.)

We can heuristically justify the claims ${C(i,j)}$ by induction on ${i}$. For ${i=0}$, the claims ${C(0,j)}$ are clear just from direct counting of progressions (as long as we keep ${a_j}$ away from the edges of ${P}$). Now suppose that ${i>0}$, and the claims ${C(i-1,j)}$ have already been proven. For any ${i \leq j < k}$ and for most ${a_j \in P}$, we have from hypothesis that there are ${\sim \delta^{i-1} |P|}$ progressions ${a_0,\ldots,a_{k-1}}$ in ${P}$ through ${a_j}$ with ${a_0 \in A_0,\ldots,a_{i-2}\in A_{i-2}}$. Let ${E = E(a_j)}$ be the set of all the values of ${a_{i-1}}$ attained by these progressions, then ${|E| \sim \delta^{i-1} |P|}$. Invoking the weak mixing hypothesis, we (heuristically, at least) conclude that for most choices of ${a_j}$, we have

$\displaystyle |A_{i-1} \cap E| \sim \delta^i |P|$

which then gives the desired claim ${C(i,j)}$.

The observant reader will note that we only needed the claim ${C(i,j)}$ in the case ${j=k-1}$ for the above argument, but for technical reasons, the full proof requires one to work with more general values of ${j}$ (also the claim ${C(i,j)}$ needs to be replaced by a more complicated version of itself, but let’s ignore this for sake of discussion).

We now return to the question of how to justify the weak mixing hypothesis (2). For a single block ${A_i}$ of ${A}$, one can easily concoct a scenario in which this hypothesis fails, by choosing ${E}$ to overlap with ${A_i}$ too strongly, or to be too disjoint from ${A_i}$. However, one can do better if one can select ${A_i}$ from a long progression of blocks. The starting point is the following simple double counting observation that gives the right upper bound:

Proposition 2 (Single upper bound) Let ${P, P+r, \ldots, P+(M-1)r}$ be a progression of progressions ${P}$ for some large ${M}$. Suppose that for each ${i=0,\ldots,M-1}$, the set ${A_i := \{ a \in P: a+ir \in A \}}$ has density ${\approx \delta}$ in ${P}$ (i.e. (1) holds). Let ${E}$ be a subset of ${P}$ of density ${\approx \sigma}$. Then (if ${M}$ is large enough) one can find an ${i = 0,\ldots,M-1}$ such that

$\displaystyle |A_i \cap E| \lessapprox \delta \sigma |P|.$

Proof: The key is the double counting identity

$\displaystyle \sum_{i=0}^{M-1} |A_i \cap E| = \sum_{a \in E} |A \cap \{ a, a+r, \ldots, a+(M-1) r\}|.$

Because ${A}$ has maximal density ${\delta}$ and ${M}$ is large, we have

$\displaystyle |A \cap \{ a, a+r, \ldots, a+(M-1) r\}| \lessapprox \delta M$

for each ${a}$, and thus

$\displaystyle \sum_{i=0}^{M-1} |A_i \cap E| \lessapprox \delta M |E|.$

The claim then follows from the pigeonhole principle. $\Box$

Now suppose we want to obtain weak mixing not just for a single set ${E}$, but for a small number ${E_1,\ldots,E_m}$ of such sets, i.e. we wish to find an ${i}$ for which

$\displaystyle |A_i \cap E_j| \lessapprox \delta \sigma_j |P|. \ \ \ \ \ (3)$

for all ${j=1,\ldots,m}$, where ${\sigma_j}$ is the density of ${E_j}$ in ${P}$. The above proposition gives, for each ${j}$, a choice of ${i}$ for which (3) holds, but it could be a different ${i}$ for each ${j}$, and so it is not immediately obvious how to use Proposition 2 to find an ${i}$ for which (3) holds simultaneously for all ${j}$. However, it turns out that the van der Waerden theorem is the perfect tool for this amplification:

Proposition 3 (Multiple upper bound) Let ${P, P+r, \ldots, P+(M-1)r}$ be a progression of progressions ${P+ir}$ for some large ${M}$. Suppose that for each ${i=0,\ldots,M-1}$, the set ${A_i := \{ a \in P: a+ir \in A \}}$ has density ${\approx \delta}$ in ${P}$ (i.e. (1) holds). For each ${1 \leq j \leq m}$, let ${E_j}$ be a subset of ${P}$ of density ${\approx \sigma_j}$. Then (if ${M}$ is large enough depending on ${j}$) one can find an ${i = 0,\ldots,M-1}$ such that

$\displaystyle |A_i \cap E_j| \lessapprox \delta \sigma_j |P|$

simultaneously for all ${1 \leq j \leq m}$.

Proof: Suppose that the claim failed (for some suitably large ${M}$). Then, for each ${i = 0,\ldots,M-1}$, there exists ${j \in \{1,\ldots,m\}}$ such that

$\displaystyle |A_i \cap E_j| \gg \delta \sigma_j |P|.$

This can be viewed as a colouring of the interval ${\{1,\ldots,M\}}$ by ${m}$ colours. If we take ${M}$ large compared to ${m}$, van der Waerden’s theorem allows us to then find a long subprogression of ${\{1,\ldots,M\}}$ which is monochromatic, so that ${j}$ is constant on this progression. But then this will furnish a counterexample to Proposition 2. $\Box$

One nice thing about this proposition is that the upper bounds can be automatically upgraded to an asymptotic:

Proposition 4 (Multiple mixing) Let ${P, P+r, \ldots, P+(M-1)r}$ be a progression of progressions ${P+ir}$ for some large ${M}$. Suppose that for each ${i=0,\ldots,M-1}$, the set ${A_i := \{ a \in P: a+ir \in A \}}$ has density ${\approx \delta}$ in ${P}$ (i.e. (1) holds). For each ${1 \leq j \leq m}$, let ${E_j}$ be a subset of ${P}$ of density ${\approx \sigma_j}$. Then (if ${M}$ is large enough depending on ${m}$) one can find an ${i = 0,\ldots,M-1}$ such that

$\displaystyle |A_i \cap E_j| \approx \delta \sigma_j |P|$

simultaneously for all ${1 \leq j \leq m}$.

Proof: By applying the previous proposition to the collection of sets ${E_1,\ldots,E_m}$ and their complements ${P\backslash E_1,\ldots,P \backslash E_m}$ (thus replacing ${m}$ with ${2m}$, one can find an ${i}$ for which

$\displaystyle |A_i \cap E_j| \lessapprox \delta \sigma_j |P|$

and

$\displaystyle |A_i \cap (P \backslash E_j)| \lessapprox \delta (1-\sigma_j) |P|$

which gives the claim. $\Box$

However, this improvement of Proposition 2 turns out to not be strong enough for applications. The reason is that the number ${m}$ of sets ${E_1,\ldots,E_m}$ for which mixing is established is too small compared with the length ${M}$ of the progression one has to use in order to obtain that mixing. However, thanks to the magic of the Szemerédi regularity lemma, one can amplify the above proposition even further, to allow for a huge number of ${E_i}$ to be mixed (at the cost of excluding a small fraction of exceptions):

Proposition 5 (Really multiple mixing) Let ${P, P+r, \ldots, P+(M-1)r}$ be a progression of progressions ${P+ir}$ for some large ${M}$. Suppose that for each ${i=0,\ldots,M-1}$, the set ${A_i := \{ a \in P: a+ir \in A \}}$ has density ${\approx \delta}$ in ${P}$ (i.e. (1) holds). For each ${v}$ in some (large) finite set ${V}$, let ${E_v}$ be a subset of ${P}$ of density ${\approx \sigma_v}$. Then (if ${M}$ is large enough, but not dependent on the size of ${V}$) one can find an ${i = 0,\ldots,M-1}$ such that

$\displaystyle |A_i \cap E_v| \approx \delta \sigma_v |P|$

simultaneously for almost all ${v \in V}$.

Proof: We build a bipartite graph ${G = (P, V, E)}$ connecting the progression ${P}$ to the finite set ${V}$ by placing an edge ${(a,v)}$ between an element ${a \in P}$ and an element ${v \in V}$ whenever ${a \in E_v}$. The number ${|E_v| \approx \sigma_v |P|}$ can then be interpreted as the degree of ${v}$ in this graph, while the number ${|A_i \cap E_v|}$ is the number of neighbours of ${v}$ that land in ${A_i}$.

We now apply the regularity lemma to this graph ${G}$. Roughly speaking, what this lemma does is to partition ${P}$ and ${V}$ into almost equally sized cells ${P = P_1 \cup \ldots P_m}$ and ${V = V_1 \cup \ldots V_m}$ such that for most pairs ${P_j, V_k}$ of cells, the graph ${G}$ resembles a random bipartite graph of some density ${d_{jk}}$ between these two cells. The key point is that the number ${m}$ of cells here is bounded uniformly in the size of ${P}$ and ${V}$. As a consequence of this lemma, one can show that for most vertices ${v}$ in a typical cell ${V_k}$, the number ${|E_v|}$ is approximately equal to

$\displaystyle |E_v| \approx \sum_{j=1}^m d_{ij} |P_j|$

and the number ${|A_i \cap E_v|}$ is approximately equal to

$\displaystyle |A_i \cap E_v| \approx \sum_{j=1}^m d_{ij} |A_i \cap P_j|.$

The point here is that the ${|V|}$ different statistics ${|A_i \cap E_v|}$ are now controlled by a mere ${m}$ statistics ${|A_i \cap P_j|}$ (this is not unlike the use of principal component analysis in statistics, incidentally, but that is another story). Now, we invoke Proposition 4 to find an ${i}$ for which

$\displaystyle |A_i \cap P_j| \approx \delta |P_j|$

simultaneously for all ${j=1,\ldots,m}$, and the claim follows. $\Box$

This proposition now suggests a way forward to establish the type of mixing properties (2) needed for the preceding attempt at proving Szemerédi’s theorem to actually work. Whereas in that attempt, we were working with a single progression of progressions ${P, P+r, \ldots, P+(k-1)r}$ of progressions containing a near-maximal density of elements of ${A}$, we will now have to work with a family ${(P_\lambda, P_\lambda+r_\lambda,\ldots,P_\lambda+(k-1)r_\lambda)_{\lambda \in \Lambda}}$ of such progression of progressions, where ${\Lambda}$ ranges over some suitably large parameter set. Furthermore, in order to invoke Proposition 5, this family must be “well-arranged” in some arithmetic sense; in particular, for a given ${i}$, it should be possible to find many reasonably large subfamilies of this family for which the ${i^{th}}$ terms ${P_\lambda + i r_\lambda}$ of the progression of progressions in this subfamily are themselves in arithmetic progression. (Also, for technical reasons having to do with the fact that the sets ${E_v}$ in Proposition 5 are not allowed to depend on ${i}$, one also needs the progressions ${P_\lambda + i' r_\lambda}$ for any given ${0 \leq i' < i}$ to be “similar” in the sense that they intersect ${A}$ in the same fashion (thus the sets ${A \cap (P_\lambda + i' r_\lambda)}$ as ${\lambda}$ varies need to be translates of each other).) If one has this sort of family, then Proposition 5 allows us to “spend” some of the degrees of freedom of the parameter set ${\Lambda}$ in order to gain good mixing properties for at least one of the sets ${P_\lambda +i r_\lambda}$ in the progression of progressions.

Of course, we still have to figure out how to get such large families of well-arranged progressions of progressions. Szemerédi’s solution was to begin by working with generalised progressions of a much larger rank ${d}$ than the rank ${2}$ progressions considered here; roughly speaking, to prove Szemerédi’s theorem for length ${k}$ progressions, one has to consider generalised progressions of rank as high as ${2^k+1}$. It is possible by a reasonably straightforward (though somewhat delicate) “density increment argument” to locate a huge generalised progression of this rank which is “saturated” by ${A}$ in a certain rather technical sense (related to the concept of “near maximal density” used previously). Then, by another reasonably elementary argument, it is possible to locate inside a suitable large generalised progression of some rank ${d}$, a family of large generalised progressions of rank ${d-1}$ which inherit many of the good properties of the original generalised progression, and which have the arithmetic structure needed for Proposition 5 to be applicable, at least for one value of ${i}$. (But getting this sort of property for all values of ${i}$ simultaneously is tricky, and requires many careful iterations of the above scheme; there is also the problem that by obtaining good behaviour for one index ${i}$, one may lose good behaviour at previous indices, leading to a sort of “Tower of Hanoi” situation which may help explain the exponential factor in the rank ${2^k+1}$ that is ultimately needed. It is an extremely delicate argument; all the parameters and definitions have to be set very precisely in order for the argument to work at all, and it is really quite remarkable that Endre was able to see it through to the end.)

The 2009 Abel prize has been awarded to Mikhail Gromov, for his contributions to numerous areas of geometry, including Riemannian geometry, symplectic geometry, and geometric group theory.

The prize is, of course, richly deserved.  I have mentioned some of Gromov’s work here on this blog, including the Bishop-Gromov inequality in Riemannian geometry (which (together with its parabolic counterpart, the monotonicity of Perelman reduced volume) plays an important role in Perelman’s proof of the Poincaré conjecture), the concept of Gromov-Hausdorff convergence (a version of which is also key in the proof of the Poincaré conjecture), and Gromov’s celebrated theorem on groups of polynomial growth, which I discussed in this post.

Another well-known result of Gromov that I am quite fond of is his nonsqueezing theorem in symplectic geometry (or Hamiltonian mechanics).  In its original form, the theorem states that a ball $B_R^{2n}$ of radius R in a symplectic vector space ${\Bbb R}^{2n}$ (with the usual symplectic structure $\omega$) cannot be mapped by a symplectomorphism into any cylinder $B_r^2 \times {\Bbb R}^{2n-2}$ which is narrower than the ball (i.e. $r < R$).  This result, which was one of the foundational results in the modern theory of symplectic invariants, is sometimes referred to as the “principle of the symplectic camel”, as it has the amusing corollary that a large “camel” (or more precisely, a 2n-dimensional ball of radius R in phase space) cannot be deformed via canonical transformations to pass through a small “needle” (or more precisely through a 2n-1-dimensional ball of radius less than R in a hyperplane).  It shows that Liouville’s theorem on the volume preservation of symplectomorphisms is not the only obstruction to mapping one object symplectically to another.

I can sketch Gromov’s original proof of the non-squeezing theorem here.  The symplectic space ${\Bbb R}^{2n}$ can be identified with the complex space ${\Bbb C}^n$, and in particular gives an almost complex structure J on  the ball $B_R^{2n}$ (roughly speaking, J allows one to multiply tangent vectors v by complex numbers, and in particular Jv can be viewed as v multiplied by the unit imaginary i).  This almost complex structure J is compatible with the symplectic form $\omega$; in particular J is tamed by $\omega$, which basically means that $\omega( v, Jv ) > 0$ for all non-zero tangent vectors v.

Now suppose for contradiction that there is a symplectic embedding $\Phi: B_{R}^{2n} \to B_r^2 \times {\Bbb R}^{2n-2}$ from the ball to a smaller cylinder.  Then we can push forward the almost complex structure J on the ball to give an almost complex structure $\Phi_* J$ on the image $\Phi(B_R^{2n})$.  This new structure is still tamed by the symplectic form $\omega = \Phi_* \omega$ on this image.

Just as complex structures can be used to define holomorphic functions, almost complex structures can be used to define pseudo-holomorphic or J-holomorphic curves.  These are curves of one complex dimension (i.e. two real dimensions, that is to say a surface) which obey the analogue of the Cauchy-Riemann equations in the almost complex setting (i.e. the tangent space of the curve is preserved by J).  The theory of such curves was pioneered by Gromov in the paper where the nonsqueezing theorem was proved.  When J is the standard almost complex structure on ${\Bbb R}^{2n} \equiv {\Bbb C}^n$, pseudoholomorphic curves coincide with holomorphic curves.  Among other things, such curves are minimal surfaces (for much the same reason that holomorphic functions are harmonic), and their symplectic areas and surface areas coincide.

Now, the point $\Phi(0)$ lies in the cylinder $B_r^2 \times {\Bbb R}^{2n-2}$ and in particular lies in a disk of symplectic area $\pi r^2$ spanning this cylinder.  This disk will not be pseudo-holomorphic in general, but it turns out that it can be deformed to obtain a pseudo-holomorphic disk spanning $\Phi(B_R^{2n})$ passing through $\Phi(0)$ of symplectic area at most $\pi r^2$.  Pulling this back by $\Phi$, we obtain a minimal surface spanning $B_R^{2n}$ passing through the origin that has surface area at most $\pi r^2$.    However, any minimal surface spanning $B_R^{2n}$ and passing through the origin is known to have area at least $\pi R^2$, giving the desired contradiction.  [This latter fact, incidentally, is quite a fun fact to prove; the key point is to first show that any closed loop of length strictly less than $2 \pi$ in the sphere $S^{2n-1}$ must lie inside an open hemisphere, and so cannot be the boundary of any minimal surface spanning the unit ball and containing the origin. Thus, the symplectic camel theorem ultimately comes down to the fact that one cannot pass a unit ball through a loop of string of length less than $2\pi$.]