You are currently browsing the monthly archive for March 2009.

This is a continuation of the 1100-1199 thread of the polymath1 project, which is now full.  The focus has now mostly shifted to generalisations of the previous problems being studied to larger alphabet sizes k, so I am changing the title here from DHJ(3) to DHJ(k) to reflect this.

The discussion is evolving rapidly, but here are some of the topics currently being discussed:

Note that much of the most recent progress has not yet been ported to the wiki.  In order to help everyone else catch up, it may useful if authors of comments (particularly comments with lengthy computations, or with corrections to previous comments) put their work on the relevant page of the wiki (not necessarily in the most polished format), and perhaps only place a summary of it on the thread itself.

[Incidentally, for the more casual followers of this project, a non-technical introduction to this project can be found at this post of Jason Dyer.]

Comments here should start from 1200.

In the previous two quarters, we have been focusing largely on the “soft” side of real analysis, which is primarily concerned with “qualitative” properties such as convergence, compactness, measurability, and so forth. In contrast, we will begin this quarter with more of an emphasis on the “hard” side of real analysis, in which we study estimates and upper and lower bounds of various quantities, such as norms of functions or operators. (Of course, the two sides of analysis are closely connected to each other; an understanding of both sides and their interrelationships, are needed in order to get the broadest and most complete perspective for this subject.)

One basic tool in hard analysis is that of interpolation, which allows one to start with a hypothesis of two (or more) “upper bound” estimates, e.g. {A_0 \leq B_0} and {A_1 \leq B_1}, and conclude a family of intermediate estimates {A_\theta \leq B_\theta} (or maybe {A_\theta \leq C_\theta B_\theta}, where {C_\theta} is a constant) for any choice of parameter {0 < \theta < 1}. Of course, interpolation is not a magic wand; one needs various hypotheses (e.g. linearity, sublinearity, convexity, or complexifiability) on {A_i, B_i} in order for interpolation methods to be applicable. Nevertheless, these techniques are available for many important classes of problems, most notably that of establishing boundedness estimates such as {\| T f \|_{L^q(Y, \nu)} \leq C \| f \|_{L^p(X, \mu)}} for linear (or “linear-like”) operators {T} from one Lebesgue space {L^p(X,\mu)} to another {L^q(Y,\nu)}. (Interpolation can also be performed for many other normed vector spaces than the Lebesgue spaces, but we will just focus on Lebesgue spaces in these notes to focus the discussion.) Using interpolation, it is possible to reduce the task of proving such estimates to that of proving various “endpoint” versions of these estimates. In some cases, each endpoint only faces a portion of the difficulty that the interpolated estimate did, and so by using interpolation one has split the task of proving the original estimate into two or more simpler subtasks. In other cases, one of the endpoint estimates is very easy, and the other one is significantly more difficult than the original estimate; thus interpolation does not really simplify the task of proving estimates in this case, but at least clarifies the relative difficulty between various estimates in a given family.

As is the case with many other tools in analysis, interpolation is not captured by a single “interpolation theorem”; instead, there are a family of such theorems, which can be broadly divided into two major categories, reflecting the two basic methods that underlie the principle of interpolation. The real interpolation method is based on a divide and conquer strategy: to understand how to obtain control on some expression such as {\| T f \|_{L^q(Y, \nu)}} for some operator {T} and some function {f}, one would divide {f} into two or more components, e.g. into components where {f} is large and where {f} is small, or where {f} is oscillating with high frequency or only varying with low frequency. Each component would be estimated using a carefully chosen combination of the extreme estimates available; optimising over these choices and summing up (using whatever linearity-type properties on {T} are available), one would hope to get a good estimate on the original expression. The strengths of the real interpolation method are that the linearity hypotheses on {T} can be relaxed to weaker hypotheses, such as sublinearity or quasilinearity; also, the endpoint estimates are allowed to be of a weaker “type” than the interpolated estimates. On the other hand, the real interpolation often concedes a multiplicative constant in the final estimates obtained, and one is usually obligated to keep the operator {T} fixed throughout the interpolation process. The proofs of real interpolation theorems are also a little bit messy, though in many cases one can simply invoke a standard instance of such theorems (e.g. the Marcinkiewicz interpolation theorem) as a black box in applications.

The complex interpolation method instead proceeds by exploiting the powerful tools of complex analysis, in particular the maximum modulus principle and its relatives (such as the Phragmén-Lindelöf principle). The idea is to rewrite the estimate to be proven (e.g. {\| T f \|_{L^q(Y, \nu)} \leq C \| f \|_{L^p(X, \mu)}}) in such a way that it can be embedded into a family of such estimates which depend holomorphically on a complex parameter {s} in some domain (e.g. the strip {\{ \sigma+it: t \in {\bf R}, \sigma \in [0,1]\}}. One then exploits things like the maximum modulus principle to bound an estimate corresponding to an interior point of this domain by the estimates on the boundary of this domain. The strengths of the complex interpolation method are that it typically gives cleaner constants than the real interpolation method, and also allows the underlying operator {T} to vary holomorphically with respect to the parameter {s}, which can significantly increase the flexibility of the interpolation technique. The proofs of these methods are also very short (if one takes the maximum modulus principle and its relatives as a black box), which make the method particularly amenable for generalisation to more intricate settings (e.g. multilinear operators, mixed Lebesgue norms, etc.). On the other hand, the somewhat rigid requirement of holomorphicity makes it much more difficult to apply this method to non-linear operators, such as sublinear or quasilinear operators; also, the interpolated estimate tends to be of the same “type” as the extreme ones, so that one does not enjoy the upgrading of weak type estimates to strong type estimates that the real interpolation method typically produces. Also, the complex method runs into some minor technical problems when target space {L^q(Y,\nu)} ceases to be a Banach space (i.e. when {q<1}) as this makes it more difficult to exploit duality.

Despite these differences, the real and complex methods tend to give broadly similar results in practice, especially if one is willing to ignore constant losses in the estimates or epsilon losses in the exponents.

The theory of both real and complex interpolation can be studied abstractly, in general normed or quasi-normed spaces; see e.g. this book for a detailed treatment. However in these notes we shall focus exclusively on interpolation for Lebesgue spaces {L^p} (and their cousins, such as the weak Lebesgue spaces {L^{p,\infty}} and the Lorentz spaces {L^{p,r}}).

Read the rest of this entry »

The 2009 Abel prize has been awarded to Mikhail Gromov, for his contributions to numerous areas of geometry, including Riemannian geometry, symplectic geometry, and geometric group theory.

The prize is, of course, richly deserved.  I have mentioned some of Gromov’s work here on this blog, including the Bishop-Gromov inequality in Riemannian geometry (which (together with its parabolic counterpart, the monotonicity of Perelman reduced volume) plays an important role in Perelman’s proof of the Poincaré conjecture), the concept of Gromov-Hausdorff convergence (a version of which is also key in the proof of the Poincaré conjecture), and Gromov’s celebrated theorem on groups of polynomial growth, which I discussed in this post.

Another well-known result of Gromov that I am quite fond of is his nonsqueezing theorem in symplectic geometry (or Hamiltonian mechanics).  In its original form, the theorem states that a ball B_R^{2n} of radius R in a symplectic vector space {\Bbb R}^{2n} (with the usual symplectic structure \omega) cannot be mapped by a symplectomorphism into any cylinder B_r^2 \times {\Bbb R}^{2n-2} which is narrower than the ball (i.e. r < R).  This result, which was one of the foundational results in the modern theory of symplectic invariants, is sometimes referred to as the “principle of the symplectic camel”, as it has the amusing corollary that a large “camel” (or more precisely, a 2n-dimensional ball of radius R in phase space) cannot be deformed via canonical transformations to pass through a small “needle” (or more precisely through a 2n-1-dimensional ball of radius less than R in a hyperplane).  It shows that Liouville’s theorem on the volume preservation of symplectomorphisms is not the only obstruction to mapping one object symplectically to another.

I can sketch Gromov’s original proof of the non-squeezing theorem here.  The symplectic space {\Bbb R}^{2n} can be identified with the complex space {\Bbb C}^n, and in particular gives an almost complex structure J on  the ball B_R^{2n} (roughly speaking, J allows one to multiply tangent vectors v by complex numbers, and in particular Jv can be viewed as v multiplied by the unit imaginary i).  This almost complex structure J is compatible with the symplectic form \omega; in particular J is tamed by \omega, which basically means that \omega( v, Jv ) > 0 for all non-zero tangent vectors v.

Now suppose for contradiction that there is a symplectic embedding \Phi: B_{R}^{2n} \to B_r^2 \times {\Bbb R}^{2n-2} from the ball to a smaller cylinder.  Then we can push forward the almost complex structure J on the ball to give an almost complex structure \Phi_* J on the image \Phi(B_R^{2n}).  This new structure is still tamed by the symplectic form \omega = \Phi_* \omega on this image.

Just as complex structures can be used to define holomorphic functions, almost complex structures can be used to define pseudo-holomorphic or J-holomorphic curves.  These are curves of one complex dimension (i.e. two real dimensions, that is to say a surface) which obey the analogue of the Cauchy-Riemann equations in the almost complex setting (i.e. the tangent space of the curve is preserved by J).  The theory of such curves was pioneered by Gromov in the paper where the nonsqueezing theorem was proved.  When J is the standard almost complex structure on {\Bbb R}^{2n} \equiv {\Bbb C}^n, pseudoholomorphic curves coincide with holomorphic curves.  Among other things, such curves are minimal surfaces (for much the same reason that holomorphic functions are harmonic), and their symplectic areas and surface areas coincide.

Now, the point \Phi(0) lies in the cylinder B_r^2 \times {\Bbb R}^{2n-2} and in particular lies in a disk of symplectic area \pi r^2 spanning this cylinder.  This disk will not be pseudo-holomorphic in general, but it turns out that it can be deformed to obtain a pseudo-holomorphic disk spanning \Phi(B_R^{2n}) passing through \Phi(0) of symplectic area at most \pi r^2.  Pulling this back by \Phi, we obtain a minimal surface spanning B_R^{2n} passing through the origin that has surface area at most \pi r^2.    However, any minimal surface spanning B_R^{2n} and passing through the origin is known to have area at least \pi R^2, giving the desired contradiction.  [This latter fact, incidentally, is quite a fun fact to prove; the key point is to first show that any closed loop of length strictly less than 2 \pi in the sphere S^{2n-1} must lie inside an open hemisphere, and so cannot be the boundary of any minimal surface spanning the unit ball and containing the origin. Thus, the symplectic camel theorem ultimately comes down to the fact that one cannot pass a unit ball through a loop of string of length less than 2\pi.]

[This article was guest authored by Frank Morgan, the vice president of the American Mathematical Society.]
The American Mathematical Society (AMS) has launched a new blog

by and for graduate students, with initial entries ranging from “Finding an Advisor” to “Getting a Job.” Although graduate students have long been institutional members of the AMS, the two groups may not always have fully appreciated each other. Graduate students sometimes wonder why they are getting all this “junk mail,” and AMS governance has had relatively minor contact with graduate students. In fact, graduate students are the future of the AMS, and conversely they need the AMS and its support for mathematics. Hence I think that it is a bright sign of the times to see the AMS launching this blog by and for graduate students. Graduate students interested in joining the editorial board or others interested in helping out can email me at; we seek a large and diverse board. Meanwhile check out the new blog and post some comments on the entries you find there.

One of the more unintuitive facts about sailing is that it is possible to harness the power of the wind to sail in a direction against that of the wind or to sail with a speed faster than the wind itself, even when the water itself is calm.  It is somewhat less known, but nevertheless true, that one can (in principle) do both at the same time – sail against the wind (even directly against the wind!) at speeds faster than the wind.   This does not contradict any laws of physics, such as conservation of momentum or energy (basically because the reservoir of momentum and energy in the wind far outweighs the portion that will be transmitted to the sailboat), but it is certainly not obvious at first sight how it is to be done.

The key is to exploit all three dimensions of space when sailing.  The most obvious dimension to exploit is the windward/leeward dimension – the direction that the wind velocity v_0  is oriented in.  But if this is the only dimension one exploits, one can only sail up to the wind speed |v_0| and no faster, and it is not possible to sail in the direction opposite to the wind.

Things get more interesting when one also exploits the crosswind dimension perpendicular to the wind velocity, in particular by tacking the sail.  If one does this, then (in principle) it becomes possible to travel up to double the speed |v_0| of wind, as we shall see below.

However, one still cannot sail against to the wind purely by tacking the sail.  To do this, one needs to not just harness  the power of the wind, but also that of the water beneath the sailboat, thus exploiting (barely) the third available dimension.  By combining the use of a sail in the air with the use of sails in the water – better known as keels, rudders, and hydrofoils – one can now sail in certain directions against the wind, and at certain speeds.  In most sailboats, one relies primarily on the keel, which lets one sail against the wind but not directly opposite it.  But if one tacks the rudder or other hydrofoils as well as the sail, then in fact one can (in principle) sail in arbitrary directions (including those directly opposite to v_0), and in arbitrary speeds (even those much larger than |v_0|), although it is quite difficult to actually achieve this in practice.  It may seem odd that the water, which we are assuming to be calm (i.e. traveling at zero velocity) can be used to increase the range of available velocities and speeds for the sailboat, but we shall see shortly why this is the case.

If one makes several simplifying and idealised (and, admittedly, rather unrealistic in practice) assumptions in the underlying physics, then sailing can in fact be analysed by a simple two-dimensional geometric model which explains all of the above statements.  In this post, I would like to describe this mathematical model and how it gives the conclusions stated above.

Read the rest of this entry »

One way to study a general class of mathematical objects is to embed them into a more structured class of mathematical objects; for instance, one could study manifolds by embedding them into Euclidean spaces. In these (optional) notes we study two (related) embedding theorems for topological spaces:

Read the rest of this entry »

The 245B final can be found here.  I am not posting solutions, but readers (both students and non-students) are welcome to discuss the final questions in the comments below.

The continuation to this course, 245C, will begin on Monday, March 29.  The topics for this course are still somewhat fluid – but I tentatively plan to cover the following topics, roughly in order:

  • L^p spaces and interpolation; fractional integration
  • The Fourier transform on {\Bbb R}^n (a very quick review; this is of course covered more fully in 247A)
  • Schwartz functions, and the theory of distributions
  • Hausdorff measure
  • The spectral theorem (introduction only; the topic is covered in depth in 255A)

I am open to further suggestions for topics that would build upon the 245AB material, which would be of interest to students, and which would not overlap too substantially with other graduate courses offered at UCLA.

This is a continuation of the 900-999 thread of the polymath1 project, which is now full.  We’ve made quite a bit of progress so far on our original mission of bounding density Hales-Jewett numbers.  In particular, we have shown

  • c_6=450: Any subset of [3]^6 with 451 points contains a combinatorial line; and there is exactly one example with 450 points with no combinatorial line.  (We have two proofs, one by a large integer program, and the other being purely human-verifiable; the latter can be found here.)
  • c'_5=124: Any subset of [3]^5 with 125 points contains a geometric line; and we have several examples with 124 points with no geometric line.  (The proof is partly computer-assisted; details are here.)
  • 353 \leq c'_6 \leq 361: A genetic algorithm has constructed 353-point solutions in [3]^6 with no geometric line; in the other direction, linear and integer programming methods have shown that any set with 362 points must have a geometric line.
  • \overline{c}^\mu_{12} = 40: In the triangular grid \{(a,b,c) \in {\Bbb N}^3:a+b+c=12\}, any set of 41 points contains an upwards-pointing equilateral triangle (a+r,b,c),(a,b+r,c),(a,b,c+r) with r>0; and we have 40-point examples without such triangles.  This was conducted by an integer program.

This timeline shows the history of these and other developments in the project.

There are still several numbers that look feasible to compute.  For instance, the bounds on c'_6 should be able to be narrowed further.  Work is slowly progressing also on the equal-slices Hales-Jewett numbers c^\mu_n, which are hyper-optimistically conjectured to equal the Fujimura numbers \overline{c}^\mu_n; this has been verified by hand up to n=3 and by integer programming up to n=5.  We are also looking at trying to reduce the dependence on computer assistance in establishing the c'_5 = 124 result; the best human result so far is 124 \leq c'_5 \leq 125.

Some progress has recently been made on some other related questions.  For instance, we now have a precise description of the lower bound on c_n coming from the Behrend-Elkin construction, namely

c_n > C 3^{n - 4\sqrt{\log 2}\sqrt{\log n}+\frac 12 \log \log n}

for some absolute constant c.

Also, it is probably a good time to transport some of the discussion in earlier threads to the wiki and make it easier for outsiders to catch up.  (Incidentally, we need a logo for that wiki; any suggestions would be welcome!)

Comments on this thread should be numbered starting at 1100.

Jordan Ellenberg, Richard Oberlin, and I have just uploaded to the arXiv the paper “The Kakeya set and maximal conjectures for algebraic varieties over finite fields“, submitted to Mathematika.  This paper builds upon some work of Dvir and later authors on the Kakeya problem in finite fields, which I have discussed in this earlier blog post.  Dvir established the following:

Kakeya set conjecture for finite fields. Let F be a finite field, and let E be a subset of F^n that contains a line in every direction.  Then E has cardinality at least c_n |F|^n for some c_n > 0.

The initial argument of Dvir gave c_n = 1/n!.  This was improved to c_n = c^n for some explicit 0 < c < 1 by Saraf and Sudan, and recently to c_n =1/2^n by Dvir, Kopparty, Saraf, and Sudan, which is within a factor 2 of the optimal result.

In our work we investigate a somewhat different set of improvements to Dvir’s result.  The first concerns the Kakeya maximal function f^*: {\Bbb P}^{n-1}(F) \to {\Bbb R} of a function f: F^n \to {\Bbb R}, defined for all directions \xi \in {\Bbb P}^{n-1}(F) in the projective hyperplane at infinity by the formula

f^*(\xi) = \sup_{\ell // \xi} \sum_{x \in \ell} |f(x)|

where the supremum ranges over all lines \ell in F^n oriented in the direction \xi.  Our first result is the endpoint L^p estimate for this operator, namely

Kakeya maximal function conjecture in finite fields. We have \| f^* \|_{\ell^n({\Bbb P}^{n-1}(F))} \leq C_n |F|^{(n-1)/n} \|f\|_{\ell^n(F^n)} for some constant C_n > 0.

This result implies Dvir’s result, since if f is the indicator function of the set E in Dvir’s result, then f^*(\xi) = |F| for every \xi \in {\Bbb P}^{n-1}(F).  However, it also gives information on more general sets E which do not necessarily contain a line in every direction, but instead contain a certain fraction of a line in a subset of directions.  The exponents here are best possible in the sense that all other \ell^p \to \ell^q mapping properties of the operator can be deduced (with bounds that are optimal up to constants) by interpolating the above estimate with more trivial estimates.  This result is the finite field analogue of a long-standing (and still open) conjecture for the Kakeya maximal function in Euclidean spaces; we rely on the polynomial method of Dvir, which thus far has not extended to the Euclidean setting (but note the very interesting variant of this method by Guth that has established the endpoint multilinear Kakeya maximal function estimate in this setting, see this blog post for further discussion).

It turns out that a direct application of the polynomial method is not sufficient to recover the full strength of the maximal function estimate; but by combining the polynomial method with the Nikishin-Maurey-Pisier-Stein “method of random rotations” (as interpreted nowadays by Stein and later by Bourgain, and originally inspired by the factorisation theorems of Nikishin, Maurey, and Pisier), one can already recover a “restricted weak type” version of the above estimate.  If one then enhances the polynomial method with the “method of multiplicities” (as introduced by Saraf and Sudan) we can then recover the full “strong type” estimate; a few more details below the fold.

It turns out that one can generalise the above results to more general affine or projective algebraic varieties over finite fields.  In particular, we showed

Kakeya maximal function conjecture in algebraic varieties. Suppose that W \subset {\Bbb P}^N is an (n-1)-dimensional algebraic variety.  Let d \geq 1 be an integer. Then we have

\| \sup_{\gamma \ni x; \gamma \not \subset W} \sum_{y \in \gamma} f(y) \|_{\ell^n_x(W(F))} \leq C_{n,d,N,W} |F|^{(n-1)/n} \|f\|_{\ell^n({\Bbb P}^N(F))}

for some constant C_{n,d,N,W} > 0, where the supremum is over all irreducible algebraic curves \gamma of degree at most d that pass through x but do not lie in W, and W(F) denotes the F-points of W.

The ordinary Kakeya maximal function conjecture corresponds to the case when N=n, W is the hyperplane at infinity, and the degree d is equal to 1.  One corollary of this estimate is a Dvir-type result: a subset of {\Bbb P}^N(F) which contains, for each x in W, an irreducible algebraic curve of degree d passing through x but not lying in W, has cardinality \gg |F|^n if |W| \gg |F|^{n-1}.  (In particular this implies a lower bound for Nikodym sets worked out by Li.)  The dependence of the implied constant on W is only via the degree of W.

The techniques used in the flat case can easily handle curves \gamma of higher degree (provided that we allow the implied constants to depend on d), but the method of random rotations does not seem to work directly on the algebraic variety W as there are usually no symmetries of this variety to exploit.  Fortunately, we can get around this by using a “random projection trick” to “flatten” W into a hyperplane (after first expressing W as the zero locus of some polynomials, and then composing with the graphing map for such polynomials), reducing the non-flat case to the flat case.

Below the fold, I wish to sketch two of the key ingredients in our arguments, the random rotations method and the random projections trick.  (We of course also use some algebraic geometry, but mostly low-tech stuff, on the level of Bezout’s theorem, though we do need one non-trivial result of Kleiman (from SGA6), that asserts that bounded degree varieties can be cut out by a bounded number of polynomials of bounded degree.)

[Update, March 14: See also Jordan’s own blog post on our paper.]

Read the rest of this entry »

Emmanuel Candés and I have just uploaded to the arXiv our paper “The power of convex relaxation: near-optimal matrix completion“, submitted to IEEE Inf. Theory.  In this paper we study the matrix completion problem, which one can view as a sort of “non-commutative” analogue of the sparse recovery problem studied in the field of compressed sensing, although there are also some other significant differences between the two problems.   The sparse recovery problem seeks to recover a sparse vector x \in {\Bbb R}^n from some linear measurements Ax = b \in {\Bbb R}^m, where A is a known m \times n matrix.  For general x, classical linear algebra tells us that if m < n, then the problem here is underdetermined and has multiple solutions; but under the additional assumption that x is sparse (most of the entries are zero), it turns out (under various hypotheses on the measurement matrix A, and in particular if A contains a sufficient amount of “randomness” or “incoherence”) that exact recovery becomes possible in the underdetermined case.  Furthermore, recovery is not only theoretically possible, but is also computationally practical in many cases; in particular, under some assumptions on A, one can recover x by minimising the convex norm \| x \|_{\ell^1} over all solutions to Ax=b.

Now we turn to the matrix completion problem.  Instead of an unknown vector x \in {\Bbb R}^n, we now have an unknown matrix M = (m_{ij})_{i \in [n_1], j \in [n_2]} \in {\Bbb R}^{n_1 \times n_2} (we use the shorthand [n] := \{1,\ldots,n\} here). We will take a specific type of underdetermined linear measurement of M, namely we pick a random subset \Omega \subset [n_1] \times [n_2] of the matrix array [n_1] \times [n_2] of some cardinality 1 \leq m \leq n_1 n_2, and form the random sample P_\Omega(M) := (m_{ij})_{(i,j) \in \Omega} \in {\Bbb R}^{\Omega} of M.

Of course, with no further information on M, it is impossible to complete the matrix M from the partial information P_\Omega(M) – we only have m pieces of information and need n_1 n_2.  But suppose we also know that M is low-rank, e.g. has rank less than r; this is an analogue of sparsity, but for matrices rather than vectors.  Then, in principle, we have reduced the number of degrees of freedom for M from n_1 n_2 to something more like O( r \max(n_1,n_2) ), and so (in analogy with compressed sensing) one may now hope to perform matrix completion with a much smaller fraction of samples, and in particular with m close to r \max(n_1,n_2).

This type of problem comes up in several real-world applications, most famously in the Netflix prize.  The Netflix prize problem is to be able to predict a very large ratings matrix M, whose rows are the customers, whose columns are the movies, and the entries are the rating that each customer would hypothetically assign to each movie.  Of course, not every customer has rented every movie from Netflix, and so only a small fraction P_\Omega(M) of this matrix is actually known.  However, if one makes the assumption that most customers’ rating preference is determined by only a small number of characteristics of the movie (e.g. genre, lead actor/actresses, director, year, etc.), then the matrix should be (approximately) low rank, and so the above type of analysis should be useful (though of course it is not going to be the only tool of use in this messy, real-world problem).

Actually, one expects to need to oversample the matrix by a logarithm or two in order to have a good chance of exact recovery, if one is sampling randomly.  This can be seen even in the rank one case r=1, in which M=uv^* is the product of a column vector and a row vector; let’s consider square matrices n_1=n_2=n for simplicity.  Observe that if the sampled coordinates \Omega completely miss one of the rows of the matrix, then the corresponding element of u has gone completely unmeasured, and one cannot hope to complete this row of the matrix.   Thus one needs to sample every row (and also every column) of the n \times n matrix.  The solution to the coupon collector’s problem then tells us that one needs about O(n \log n) samples to achieve this goal.  In fact, the theory of Erdős-Rényi random graphs tells us that the bipartite graph induced by \Omega becomes almost surely connected beyond this threshold, which turns out to be exactly what is needed to perform matrix completion for rank 1 matrices.

On the other hand, one cannot hope to complete the matrix if some of the singular vectors of the matrix are extremely sparse.  For instance, in the Netflix problem, a singularly idiosyncratic customer (or dually, a singularly unclassifiable movie) may give rise to a row or column of M that has no relation to the rest of the matrix, occupying its own separate component of the singular value decomposition of M; such a row or column is then impossible to complete exactly without sampling the entirety of that row or column.  Thus, to get exact matrix completion from a small fraction of entries, one needs some sort of incoherence assumption on the singular vectors, which spreads them out across all coordinates in a roughly even manner, as opposed to being concentrated on just a few values.

In a recent paper, Candés and Recht proposed solving the matrix completion problem by minimising the nuclear norm (or trace norm)

\|M\|_* = \sum_{i=1}^{\min(n_1,n_2)} \sigma_i(M) = \hbox{tr}( M M^*)^{1/2}

amongst all matrices consistent with the observed data P_\Omega(M).  This nuclear norm is the non-commutative counterpart to the \ell^1 norm for vectors, and so this algorithm is analogous to the \ell^1 minimisation (or basis pursuit) algorithm which is effective for compressed sensing (though not the only such algorithm for this task).  They showed, roughly speaking, that exact matrix completion (for, say, square matrices n_1=n_2=n for simplicity) is ensured with high probability so long as the singular vectors obey a certain incoherence property (basically, their \ell^\infty norm should be close to the minimal possible value, namely O(1/\sqrt{n})), so long as one had the condition

m \gg n^{1.2} r \log n.

This differs from the presumably optimal threshold of nr \log n by a factor of about n^{0.2}.

The main result of our paper is to mostly eliminate this gap, at the cost of a stronger hypothesis on the matrix being measured:

Main theorem. (Informal statement)  Suppose the n_1 \times n_2 matrix M has rank r and obeys a certain “strong incoherence property”.  Then with high probability, nuclear norm minimisation will recover M from a random sample P_\Omega(M) provided that m \gg n r \log^{O(1)} n, where n := \max(n_1,n_2).

A result of a broadly similar nature, but with a rather different recovery algorithm and with a somewhat different range of applicability, was recently established by Keshavan, Oh, and Montanari.  The strong incoherence property is somewhat technical, but is related to the Candés-Recht incoherence property and is satisfied by a number of reasonable random matrix models.  The exponent O(1) here is reasonably civilised (ranging between 2 and 9, depending on the specific model and parameters being used).

Read the rest of this entry »