You are currently browsing the monthly archive for June 2009.

The summer continues to allow some clearing of the backlog of projects accumulated during the academic year: Helge Holden, Kenneth Karlsen, Nils Risebro, and myself have uploaded to the arXiv the paper “Operator splitting for the KdV equation“, submitted to Math. Comp..   This paper is concerned with accurate numerical schemes for solving initial value problems for the Korteweg-de Vries equation

u_t + u_{xxx} = u u_x; \quad u(0,x) = u_0(x), (1)

though the analysis here would be valid for a wide range of other semilinear dispersive models as well.  In principle, these equations, which are completely integrable, can be solved exactly by the inverse scattering method, but fast and accurate implementations of this method are still not as satisfactory as one would like.  On the other hand, the linear Korteweg-de Vries equation

u_t + u_{xxx} = 0; \quad u(0,x) = u_0(x) (2)

can be solved exactly (with accurate and fast numerics) via the (fast) Fourier transform, while the (inviscid) Burgers equation

u_t = u u_x; \quad u(0,x) = u_0(x) (3)

can also be solved exactly (and quickly) by the method of characteristics.  Since the KdV equation is in some sense a combination of the equations (2) and (3), it is then reasonable to hope that some combination of the solution schemes for (2) and (3) can be used to solve (1), at least in some approximate sense.

One way to do this is by the method of operator splitting.  Observe from the formal approximation e^{A \Delta t} \approx 1 + A \Delta t + O( \Delta t^2) (where \Delta t should be thought of as small, and A is some matrix or linear operator), that one has

e^{(A+B) \Delta t} u_0 = e^{A \Delta t} e^{B \Delta t} u_0 + O( \Delta t^2 ), (4)

[we do not assume A and B to commute here] and thus we formally have

e^{n (A+B) \Delta t} u_0 = (e^{A \Delta t} e^{B \Delta t})^n u_0 + O( \Delta t ) (5)

if n = T/\Delta t for some fixed time T (thus n = O(1/\Delta t)).  As a consequence, if one wants to solve the linear ODE

u_t = (A+B) u; \quad u(0) = u_0 (1′)

for time T = n \Delta t, one can achieve an approximate solution (accurate to order \Delta t) by alternating n times between evolving the ODE

u_t = A u (2′)

for time \Delta t, and evolving the ODE

u_t = B u (3′)

for time \Delta t, starting with the initial data u_0.

It turns out that this scheme can be formalised, and furthermore generalised to nonlinear settings such as those for the KdV equation (1).  More precisely, we show that if u_0 \in H^s({\Bbb R}) for some s \geq 5, then one can solve (1) to accuracy O(\Delta t) in H^s norm for any fixed time T = n \Delta t by alternating between evolving (2) and (3) for times \Delta t (this scheme is known as Godunov splitting).

Actually, one can obtain faster convergence by modifying the scheme, at the cost of requiring higher regularity on the data; the situation is similar to that of numerical integration (or quadrature), in which the midpoint rule or Simpson’s rule provide more accuracy than the Riemann integral if the integrand is smooth.  For instance, one has the variant

e^{n (A+B) \Delta t} = (e^{A \Delta t/2} e^{B \Delta t/2} e^{B \Delta t/2} e^{A \Delta t/2})^n + O( \Delta t^2 ) (6)

of (5), which can be seen by expansion to second order in \Delta t (or by playing around with the Baker-Campbell-Hausdorff formula).  For KdV, we can rigorously show that the analogous scheme (known as Strang splitting) involving the indicated combination of evolutions of (2) and (3) will also converge to accuracy \Delta t^2 in H^s norm, provided that u_0 \in H^s({\Bbb R}) and s \geq 17.

Read the rest of this entry »

One further paper in this stream: László Erdős, José Ramírez, Benjamin Schlein, Van Vu, Horng-Tzer Yau, and myself have just uploaded to the arXiv the paper “Bulk universality for Wigner hermitian matrices with subexponential decay“, submitted to Mathematical Research Letters.  (Incidentally, this is my first six-author paper I have been involved in, not counting the polymath projects of course, though I have had a number of five-author papers.)

This short paper (9 pages) combines the machinery from two recent papers on the universality conjecture for the eigenvalue spacings in the bulk for Wigner random matrices (see my earlier blog post for more discussion).  On the one hand, the paper of Erdős-Ramírez-Schlein-Yau established this conjecture under the additional hypothesis that the distribution of the individual entries obeyed some smoothness and exponential decay conditions.  Meanwhile, the paper of Van Vu and myself (which I discussed in my earlier blog post) established the conjecture under a somewhat different set of hypotheses, namely that the distribution of the individual entries obeyed some moment conditions (in particular, the third moment had to vanish), a support condition (the entries had to have real part supported in at least three points), and an exponential decay condition.

After comparing our results, the six of us realised that our methods could in fact be combined rather easily to obtain a stronger result, establishing the universality conjecture assuming only a exponential decay (or more precisely, sub-exponential decay) bound {\Bbb P}(|x_{\ell k}| > t ) \ll \exp( - t^c ) on the coefficients; thus all regularity, moment, and support conditions have been eliminated.  (There is one catch, namely that we can no longer control a single spacing \lambda_{i+1}-\lambda_i for a single fixed i, but must now average over all 1 \leq i \leq n before recovering the universality.  This is an annoying technical issue but it may be resolvable in the future with further refinements to the method.)

I can describe the main idea behind the unified approach here.  One can arrange the Wigner matrices in a hierarchy, from most structured to least structured:

  • The most structured (or special) ensemble is the Gaussian Unitary Ensemble (GUE), in which the coefficients are gaussian. Here, one has very explicit and tractable formulae for the eigenvalue distributions, gap spacing, etc.
  • The next most structured ensemble of Wigner matrices are the Gaussian-divisible or Johansson matrices, which are matrices H of the form H = e^{-t/2} \hat H + (1-e^{-t})^{1/2} V, where \hat H is another Wigner matrix, V is a GUE matrix independent of \hat H, and 0 < t < 1 is a fixed parameter independent of n.  Here, one still has quite explicit (though not quite as tractable) formulae for the joint eigenvalue distribution and related statistics.  Note that the limiting case t=1 is GUE.
  • After this, one has the Ornstein-Uhlenbeck-evolved matrices, which are also of the form H = e^{-t/2} \hat H + (1-e^{-t})^{1/2} V, but now t = n^{-1+\delta} decays at a power rate with n, rather than being comparable to 1.  Explicit formulae still exist for these matrices, but extracting universality out of this is hard work (and occupies the bulk of the paper of Erdős-Ramírez-Schlein-Yau).
  • Finally, one has arbitrary Wigner matrices, which can be viewed as the t=0 limit of the above Ornstein-Uhlenbeck process.

The arguments in the paper of Erdős-Ramírez-Schlein-Yau can be summarised as follows (I assume subexponential decay throughout this discussion):

  1. (Structured case) The universality conjecture is true for Ornstein-Uhlenbeck-evolved matrices with t = n^{-1+\delta} for any 0 < \delta \leq 1.  (The case 1/4 < \delta \leq 1 was treated in an earlier paper of Erdős-Ramírez-Schlein-Yau, while the case where t is comparable to 1 was treated by Johansson.)
  2. (Matching) Every Wigner matrix with suitable smoothness conditions can be “matched” with an Ornstein-Uhlenbeck-evolved matrix, in the sense that the eigenvalue statistics for the two matrices are asymptotically identical.  (This is relatively easy due to the fact that \delta can be taken arbitrarily close to zero.)
  3. Combining 1. and 2. one obtains universality for all Wigner matrices obeying suitable smoothness conditions.

The arguments in the paper of Van and myself can be summarised as follows:

  1. (Structured case) The universality conjecture is true for Johansson matrices, by the paper of Johansson.
  2. (Matching) Every Wigner matrix with some moment and support conditions can be “matched” with a Johansson matrix, in the sense that the first four moments of the entries agree, and hence (by the Lindeberg strategy in our paper) have asymptotically identical statistics.
  3. Combining 1. and 2. one obtains universality for all Wigner matrices obtaining suitable moment and support conditions.

What we realised is by combining the hard part 1. of the paper of Erdős-Ramírez-Schlein-Yau with the hard part 2. of the paper of Van and myself, we can remove all regularity, moment, and support conditions.  Roughly speaking, the unified argument proceeds as follows:

  1. (Structured case) By the arguments of Erdős-Ramírez-Schlein-Yau, the universality conjecture is true for Ornstein-Uhlenbeck-evolved matrices with t = n^{-1+\delta} for any 0 < \delta \leq 1.
  2. (Matching) Every Wigner matrix H can be “matched” with an Ornstein-Uhlenbeck-evolved matrix e^{-t/2} H + (1-e^{-t})^{1/2} V for t= n^{-1+0.01} (say), in the sense that the first four moments of the entries almost agree, which is enough (by the arguments of Van and myself) to show that these two matrices have asymptotically identical statistics on the average.
  3. Combining 1. and 2. one obtains universality for the averaged statistics for all Wigner matrices.

The averaging should be removable, but this would require better convergence results to the semicircular law than are currently known (except with additional hypotheses, such as vanishing third moment).  The subexponential decay should also be relaxed to a condition of finiteness for some fixed moment {\Bbb E} |x|^C, but we did not pursue this direction in order to keep the paper short.

It turns out to be a favourable week or two for me to finally finish a number of papers that had been at a nearly completed stage for a while.  I have just uploaded to the arXiv my article “Sumset and inverse sumset theorems for Shannon entropy“, submitted to Combinatorics, Probability, and Computing.  This paper evolved from a “deleted scene” in my book with Van Vu entitled “Entropy sumset estimates“.  In those notes, we developed analogues of the standard Plünnecke-Ruzsa sumset estimates (which relate quantities such as the cardinalities |A+B|, |A-B| of the sum and difference sets of two finite sets A, B in an additive group G to each other), to the entropy setting, in which the finite sets A \subset G are replaced instead with discrete random variables X taking values in that group G, and the (logarithm of the) cardinality |A| is replaced with the Shannon entropy

{\textbf H}(X) := \sum_{x \in G} {\Bbb P}(x \in X) \log \frac{1}{{\Bbb P}(x \in X)}.

This quantity measures the information content of X; for instance, if {\textbf H}(X) = k \log 2, then it will take k bits on the average to store the value of X (thus a string of n independent copies of X will require about nk bits of storage in the asymptotic limit n \to \infty).  The relationship between entropy and cardinality is that if X is the uniform distribution on a finite non-empty set A, then {\textbf H}(X) = \log |A|.  If instead X is non-uniformly distributed on A, one has 0 < {\textbf H}(X) < \log |A|, thanks to Jensen’s inequality.

It turns out that many estimates on sumsets have entropy analogues, which resemble the “logarithm” of the sumset estimates.  For instance, the trivial bounds

|A|, |B| \leq |A+B| \leq |A| |B|

have the entropy analogue

{\textbf H}(X), {\textbf H}(Y) \leq {\textbf H}(X+Y) \leq {\textbf H}(X) + {\textbf H}(Y)

whenever X, Y are independent discrete random variables in an additive group; this is not difficult to deduce from standard entropy inequalities.  Slightly more non-trivially, the sum set estimate

|A+B| \leq \frac{|A-B|^3}{|A| |B|}

established by Ruzsa, has an entropy analogue

{\textbf H}(X+Y) \leq 3 {\textbf H}(X-Y) - {\textbf H}(X) - {\textbf H}(Y),

and similarly for a number of other standard sumset inequalities in the literature (e.g. the Rusza triangle inequality, the Plünnecke-Rusza inequality, and the Balog-Szemeredi-Gowers theorem, though the entropy analogue of the latter requires a little bit of care to state).  These inequalities can actually be deduced fairly easily from elementary arithmetic identities, together with standard entropy inequalities, most notably the submodularity inequality

{\textbf H}(Z) + {\textbf H}(W) \leq {\textbf H}(X) + {\textbf H}(Y)

whenever X,Y,Z,W are discrete random variables such that X and Y each determine W separately (thus W = f(X) = g(Y) for some deterministic functions f, g) and X and Y determine Z jointly (thus Z = h(X,Y) for some deterministic function f).  For instance, if X,Y,Z are independent discrete random variables in an additive group G, then (X-Y,Y-Z) and (X,Z) each determine X-Z separately, and determine X,Y,Z jointly, leading to the inequality

{\textbf H}(X,Y,Z) + {\textbf H}(X-Z) \leq {\textbf H}(X-Y,Y-Z) + {\textbf H}(X,Z)

which soon leads to the entropy Rusza triangle inequality

{\textbf H}(X-Z) \leq {\textbf H}(X-Y) + {\textbf H}(Y-Z) - {\textbf H}(Y)

which is an analogue of the combinatorial Ruzsa triangle inequality

|A-C| \leq \frac{|A-B| |B-C|}{|B|}.

All of this was already in the unpublished notes with Van, though I include it in this paper in order to place it in the literature.  The main novelty of the paper, though, is to consider the entropy analogue of Freiman’s theorem, which classifies those sets A for which |A+A| = O(|A|).  Here, the analogous problem is to classify the random variables X such that {\textbf H}(X_1+X_2) = {\textbf H}(X) + O(1), where X_1,X_2 are independent copies of X.  Let us say that X has small doubling if this is the case.

For instance, the uniform distribution U on a finite subgroup H of G has small doubling (in fact {\textbf H}(U_1+U_2)={\textbf H}(U) = \log |H| in this case). In a similar spirit, the uniform distribution on a (generalised) arithmetic progression P also has small doubling, as does the uniform distribution on a coset progression H+P.  Also, if X has small doubling, and Y has bounded entropy, then X+Y also has small doubling, even if Y and X are not independent.  The main theorem is that these are the only cases:

Theorem 1. (Informal statement) X has small doubling if and only if X = U + Y for some uniform distribution U on a coset progression (of bounded rank), and Y has bounded entropy.

For instance, suppose that X was the uniform distribution on a dense subset A of a finite group G.  Then Theorem 1 asserts that X is close in a “transport metric” sense to the uniform distribution U on G, in the sense that it is possible to rearrange or transport the probability distribution of X to the probability distribution of U (or vice versa) by shifting each component of the mass of X by an amount Y which has bounded entropy (which basically means that it primarily ranges inside a set of bounded cardinality).  The way one shows this is by randomly translating the mass of X around by a few random shifts to approximately uniformise the distribution, and then deal with the residual fluctuation in the distribution by hand.  Theorem 1 as a whole is established by using the Freiman theorem in the combinatorial setting combined with various elementary convexity and entropy inequality arguments to reduce matters to the above model case when X is supported inside a finite group G and has near-maximal entropy.

I also show a variant of the above statement: if X, Y are independent and {\textbf H}(X+Y) = {\textbf H}(X)+O(1) = {\textbf H}(Y)+O(1), then we have X \equiv Y+Z (i.e. X has the same distribution as Y+Z for some Z of bounded entropy (not necessarily independent of X or Y).  Thus if two random variables are additively related to each other, then they can be additively transported to each other by using a bounded amount of entropy.

In the last part of the paper I relate these discrete entropies to their continuous counterparts

{\textbf H}_{\Bbb R}(X) := \int_{{\Bbb R}} p(x) \log \frac{1}{p(x)}\ dx,

where X is now a continuous random variable on the real line with density function p(x)\ dx.  There are a number of sum set inequalities known in this setting, for instance

{\textbf H}_{\Bbb R}(X_1 + X_2) \geq {\textbf H}_{\Bbb R}(X) + \frac{1}{2} \log 2,

for independent copies X_1,X_2 of a finite entropy random variable X, with equality if and only if X is a Gaussian.  Using this inequality and Theorem 1, I show a discrete version, namely that

{\textbf H}(X_1 + X_2) \geq {\textbf H}(X) + \frac{1}{2} \log 2 - \varepsilon,

whenever \varepsilon> 0 and X_1,X_2 are independent copies of a random variable in {\Bbb Z} (or any other torsion-free abelian group) whose entropy is sufficiently large depending on \varepsilon.  This is somewhat analogous to the classical sumset inequality

|A+A| \geq 2 |A| - 1

though notice that we have a gain of just \frac{1}{2} \log 2 rather than \log 2 here, the point being that there is a Gaussian counterexample in the entropy setting which does not have a combinatorial analogue (except perhaps in the high-dimensional limit).  The main idea is to use Theorem 1 to trap most of X inside a coset progression, at which point one can use Fourier-analytic additive combinatorial tools to show that the distribution X_1+X_2 is “smooth” in some non-trivial direction r, which can then be used to approximate the discrete distribution by a continuous one.

I also conjecture more generally that the entropy monotonicity inequalities established by Artstein, Barthe, Ball, and Naor in the continuous case also hold in the above sense in the discrete case, though my method of proof breaks down because I no longer can assume small doubling.

I’m continuing the stream of uploaded papers this week with my paper “Freiman’s theorem for solvable groups“, submitted to Contrib. Disc. Math..  This paper concerns the problem (discussed in this earlier blog post) of determining the correct analogue of Freiman’s theorem in a general non-abelian group G = (G,\cdot).  Specifically, if A \subset G is a finite set that obeys the doubling condition |A \cdot A| \leq K|A| for some bounded K, what does this tell us about A?  Heuristically, we expect A to behave like a finite subgroup of G (or perhaps a coset of such a subgroup).

When G is the integers (with the additive group operation), Freiman’s theorem then tells us that A is controlled by a generalised arithmetic progression P, where I say that one set A is controlled by another P if they have comparable size, and the former can be covered by a finite number of translates of the latter.  (One can view generalised arithmetic progressions as an approximate version of a subgroup, in which one only uses the generators of the progression for a finite amount of time before stopping, as opposed to groups which allow words of unbounded length in the generators.) For more general abelian groups, the Freiman theorem of Green and Ruzsa tells us that a set of bounded doubling is controlled by a generalised coset progression P+H, i.e. the sum of a generalised arithmetic progression P and a finite subgroup H of G.  (Of course, if G is torsion-free, the finite subgroup H must be trivial.)

In this paper we address the case when G is a solvable group of bounded derived length.  The main result is that if a subset of G has small doubing, then it is controlled by an object which I call a “coset nilprogression”, which is a certain technical generalisation of a coset progression, in which the generators do not quite commute, but have commutator expressible in terms of “higher order” generators.  This is essentially a sharp characterisation of such sets, except for the fact that one would like a more explicit description of these coset nilprogressions.   In the torsion-free case, a more explicit description (analogous to the Mal’cev basis description of nilpotent groups) has appeared in a very recent paper of Breulliard and Green; in the case of monomial groups (a class of groups that overlaps to a large extent with solvable groups), and assuming a polynomial growth condition rather than a doubling condition, a related result controlling A by balls in a suitable type of metric has appeared in very recent work of Sanders.  In the nilpotent case there is also a nice recent argument of Fisher, Peng, and Katz which shows that sets of small doubling remain of small doubling with respect to the Lie algebra operations of addition and Lie bracket, and thus are amenable to the abelian Freiman theorems.

The conclusion of my paper is easiest to state (and easiest to prove) in the model case of the lamplighter group G = {\Bbb Z} \rtimes {\Bbb F}_2^\omega, where {\Bbb F}_2^\omega = \lim_{n \to \infty} {\Bbb F}_2^n is the additive group of doubly infinite sequences in the finite field {\Bbb F}_2 with only finitely many non-zero entries, and {\Bbb Z} acts on this space by translations.  This is a solvable group of derived length two.  The main result here is

Theorem 1. (Freiman’s theorem for the lamplighter group) If A \subset {\Bbb Z} \ltimes {\Bbb F}_2^\omega has bounded doubling, then A is controlled either by a finite subspace of the “vertical” group \{0\} \times {\Bbb F}_2^\omega, or else by a set of the form \{ (n,\phi(n)): n \in P \}, where P \subset {\Bbb Z} is a generalised arithmetic progression, and \phi: P \to {\Bbb F}_2^{\omega} obeys the Freiman isomorphism property (n_1,\phi(n_1)) \cdot (n_2, \phi(n_2)) = (n_3,\phi(n_3)) \cdot (n_4,\phi(n_4)) whenever n_1,n_2,n_3,n_4 \in P and n_1+n_2=n_3+n_4.

This result, incidentally, recovers an earlier result of Lindenstrauss that the lamplighter group does not contain a Følner sequence of sets of uniformly bounded doubling.  It is a good exercise to establish the “exact” version of this theorem, in which one classifies subgroups of the lamplighter group rather than sets of small doubling; indeed, the proof of this the above theorem follows fairly closely the natural proof of the exact version.

One application of the solvable Freiman theorem is the following quantitative version of a classical result of Milnor and of Wolf, which asserts that any solvable group of polynomial growth is virtually nilpotent:

Theorem 2. (Quantitative Milnor-Wolf theorem) Let G be a solvable group of derived length O(1), let S be a set of generators for G, and suppose one has the polynomial growth condition |B_S(R)| \leq R^d for some d = O(1), where B_S(R) is the set of all words generated by S of length at most R.  If R is sufficiently large, then this implies that G is virtually nilpotent; more precisely, G contains a nilpotent subgroup of step O(1) and index O(R^{O(1)}).

The key points here are that one only needs polynomial growth at a single scale R, rather than on many scales, and that the index of the nilpotent subgroup has polynomial size.

The proofs are based on an induction on the derived length.  After some standard manipulations (basically, splitting A by an approximate version of a short exact sequence), the problem boils down to that of understanding the action \rho of some finite set A on a set E in an additive group.  If one assumes that E has small doubling and that the action of A leaves E approximately invariant, then one can show that E is a coset progression, and the action of A can be described efficiently using the generators of that progression (after refining the set A a bit).

In the course of the proof we need two new additive combinatorial results which may be of independent interest.  The first is a variant of a well-known theorem of Sárközy, which asserts that if A is a large subset of an arithmetic progression P, then an iterated sumset kA of A for some k=O(1) itself contains a long arithmetic progression. Here, we need the related fact that if A is a large subset of a coset progression, then an iterated subset kA for k=O(1) contains a large coset progression Q, and furthermore this inclusion is “robust” in the sense that all elements the elements of Q have a large number of representations as sums of elements of A.  We also need a new (non-commutative) variant of the Balog-Szemerédi(-Gowers) lemma, which asserts that if A has small doubling, then A (or more precisely A \cdot A^{-1}) contains a large “core” subset D such that almost all of a large iterated subset kD of D still lies inside A \cdot A^{-1}).  (This may not look like the usual Balog-Szemerédi lemma, but the proof of the lemma is almost identical to the original proof of Balog and Szemerédi, in particular relying on the Szemerédi regularity lemma.

For a number of reasons, including the start of the summer break for me and my coauthors, a number of papers that we have been working on are being released this week.  For instance, Ben Green and I have just uploaded to the arXiv our paper “An equivalence between inverse sumset theorems and inverse conjectures for the U^3 norm“, submitted to Math. Proc. Camb. Phil. Soc..  The main result of this short paper (which was briefly announced in this earlier post) is a connection between two types of inverse theorems in additive combinatorics, namely the inverse sumset theorems of Freiman type, and inverse theorems for the Gowers uniformity norm, and more specifically, for the U^3 norm

\|f\|_{U^3(G)}^8 := {\Bbb E}_{x,a,b,c \in G} f(x) \overline{f(x+a)} \overline{f(x+b)} \overline{f(x+c)} f(x+a+b) f(x+a+c) f(x+b+c) \overline{f(x+a+b+c)}

on finite additive group G, where f: G \to {\Bbb C} is a complex-valued function.

As usual, the connection is easiest to state in a finite field model such as G = {\Bbb F}_2^n.  In this case, we have the following inverse sumset theorem of Ruzsa:

Theorem 1. If A \subset {\Bbb F}_2^n is such that |A+A| \leq K|A|, then A can be covered by a translate of a subspace of {\Bbb F}_2^n of cardinality at most K^2 2^{K^4} |A|.

The constant K^2 2^{K^4} has been improved for large K in a sequence of papers, from K 2^{\lfloor K^3 \rfloor-1} by Ruzsa, K^2 2^{K^2-2} by Green-Ruzsa, 2^{O(K^{3/2} \log(1+K)} by Sanders, 2^{2K+O(\sqrt{K} \log K}) by Green and myself, and finally 2^{2K+O(\log K)} by Konyagin (private communication) which is sharp except for the precise value of the O() implied constant (as can be seen by considering the example when A consists of about 2K independent elements).  However, it is conjectured that the polynomial loss can be removed entirely if one modifies the conclusion slightly:

Conjecture 1. (Polynomial Freiman-Ruzsa conjecture for {\Bbb F}_2^n.) If A \subset {\Bbb F}_2^n is such that |A+A| \leq K|A|, then A can be covered by O(K^{O(1)}) translates of subspaces of {\Bbb F}_2^n of cardinality at most |A|.

This conjecture was verified for downsets by Green and myself, but is open in general.   This conjecture has a number of equivalent formulations; see this paper of Green for more discussion.  In this previous post we show that a stronger version of this conjecture fails.

Meanwhile, for the Gowers norm, we have the following inverse theorem, due to Samorodnitsky:

Theorem 2. Let f: {\Bbb F}_2^n \to [-1,1] be a function whose U^3 norm is at least 1/K.  Then there exists a quadratic polynomial Q: {\Bbb F}_2^n \to {\Bbb F}_2 such that |{\Bbb E}_{x \in {\Bbb F}_2^n} f(x) (-1)^{Q(x)}| \geq \exp( - O(K)^{O(1)} ).

Note that the quadratic phases (-1)^{Q(x)} are the only functions taking values in [-1,1] whose U^3 norm attains its maximal value of 1.

It is conjectured that the exponentially weak correlation here can be strengthened to a polynomial one:

Conjecture 2. (Polynomial inverse conjecture for the U^3({\Bbb F}_2^n) norm). Let f: {\Bbb F}_2^n \to [-1,1] be a function whose U^3 norm is at least 1/K.  Then there exists a quadratic polynomial Q: {\Bbb F}_2^n \to {\Bbb F}_2 such that |{\Bbb E}_{x \in {\Bbb F}_2^n} f(x) (-1)^{Q(x)}| \geq K^{-O(1)}.

The first main result of this paper is

Theorem 3. Conjecture 1 and Conjecture 2 are equivalent.

This result was also independently observed by Shachar Lovett (private communication).  We also establish an analogous result for the cyclic group {\Bbb Z}/N{\Bbb Z}, in which the notion of polynomial is replaced by that of a subexponential \exp(K^{o(1)}), and in which the notion of a quadratic polynomial is replaced by a 2-step nilsequence; the precise statement is a bit technical and will not be given here.  We also observe a partial partial analogue of the correpsondence between inverse sumset theorems and Gowers norms in the higher order case, in particular observing that U^4 inverse theorems imply a certain rigidity result for “Freiman-quadratic polynomials” (a quadratic version of Conjecture 3 below).

Below the fold, we sketch the proof of Theorem 3.

Read the rest of this entry »

I’ve just uploaded to the arXiv my paper “Global regularity for a logarithmically supercritical hyperdissipative Navier-Stokes equation“, submitted to Analysis & PDE.  It is a famous problem to establish the existence of global smooth solutions to the three-dimensional Navier-Stokes system of equations

\partial_t u + (u \cdot \nabla) u = \Delta u - \nabla p
\nabla \cdot u = 0
u(0,x) = u_0(x)

given smooth, compactly supported, divergence-free initial data u_0: {\Bbb R}^3 \to {\Bbb R}^3.

I do not claim to have any substantial progress on this problem here.  Instead, the paper makes a small observation about the hyper-dissipative version of the Navier-Stokes equations, namely

\partial_t u + (u \cdot \nabla) u = - |\nabla|^{2\alpha} u - \nabla p
\nabla \cdot u = 0
u(0,x) = u_0(x)

for some \alpha > 1.  It is a folklore result that global regularity for this equation holds for \alpha \geq 5/4; the significance of the exponent 5/4 is that it is energy-critical, in the sense that the scaling which preserves this particular hyper-dissipative Navier-Stokes equation, also preserves the energy.

Values of \alpha below 5/4 (including, unfortunately, the case \alpha=1, which is the original Navier-Stokes equation) are supercritical and thus establishing global regularity beyond the reach of most known methods (see my earlier blog post for more discussion).

A few years ago, I observed (in the case of the spherically symmetric wave equation) that this “criticality barrier” had a very small amount of flexibility to it, in that one could push a critical argument to a slightly supercritical one by exploiting spacetime integral estimates a little bit more.  I realised recently that the same principle applied to hyperdissipative Navier-Stokes; here, the relevant spacetime integral estimate is the energy dissipation inequality

\int_0^T \int_{{\Bbb R}^d} | |\nabla|^\alpha u(t,x)|^2\ dx dt \leq \frac{1}{2} \int_{{\Bbb R}^d} |u_0(x)|^2\ dx

which ensures that the energy dissipation a(t) := \int_{{\Bbb R}^d} | |\nabla|^\alpha u(t,x)|^2\ dx is locally integrable (and in fact globally integrable) in time.

In this paper I push the global regularity results by a fraction of a logarithm from \alpha=5/4 towards \alpha=1.  For instance, the argument shows that the logarithmically supercritical equation

\partial_t u + (u \cdot \nabla) u = - \frac{|\nabla|^{5/2}}{\log^{1/2}(2-\Delta)} u - \nabla p (0)
\nabla \cdot u = 0
u(0,x) = u_0(x)

admits global smooth solutions.

The argument is in fact quite simple (the paper is seven pages in length), and relies on known technology; one just applies the energy method and a logarithmically modified Sobolev inequality in the spirit of a well-known inequality of Brezis and Wainger.  It looks like it will take quite a bit of effort though to improve the logarithmic factor much further.

One way to explain the tiny bit of wiggle room beyond the critical case is as follows.  The standard energy method approach to the critical Navier-Stokes equation relies at one stage on Gronwall’s inequality, which among other things asserts that if a time-dependent non-negative quantity E(t) obeys the differential inequality

\partial_t E(t) \leq a(t) E(t) (1)

and a(t) was locally integrable, then E does not blow up in time; in fact, one has the inequality

E(t) \leq E(0) \exp( \int_0^t a(s)\ ds ).

A slight modification of the argument shows that one can replace the linear inequality with a slightly superlinear inequality.  For instance, the differential inequality

\partial_t E(t) \leq a(t) E(t) \log E(t) (2)

also does not blow up in time; indeed, a separation of variables argument gives the explicit double-exponential bound

E(t) \leq \exp(\exp( \int_0^t a(s)\ ds + \log \log E(0) ))

(let’s take E(0) > 1 and all functions smooth, to avoid technicalities).  It is this ability to go beyond Gronwall’s inequality by a little bit which is really at the heart of the logarithmically supercritical phenomenon.  In the paper, I establish an inequality basically of the shape (2), where E(t) is a suitably high-regularity Sobolev norm of u(t), and a(t) is basically the energy dissipation mentioned earlier.  The point is that the logarithmic loss of \log(1 - \Delta)^{1/4} in the dissipation can eventually be converted (by a Brezis-Wainger type argument) to a logarithmic loss in the high-regularity energy, as this energy can serve as a proxy for the frequency |\xi|, which in turn serves as a proxy for the Laplacian -\Delta.

To put it another way, with a linear exponential growth model, such as \partial_t E(t) = C E(t), it takes a constant amount of time for E to double, and so E never becomes infinite in finite time.  With an equation such as \partial_t E(t) = C E(t) \log E(t), the time taken for E to double from (say) 2^n to 2^{n+1} now shrinks to zero, but only as quickly as the harmonic series 1/n, so it still takes an infinite amount of time for E to blow up.  But because the divergence of \sum_n 1/n is logarithmically slow, the growth of E is now a double exponential rather than a single one.  So there is a little bit of room to exploit between exponential growth and blowup.

Interestingly, there is a heuristic argument that suggests that the half-logarithmic loss in (0) can be widened to a full logarithmic loss, which I give below the fold.

Read the rest of this entry »

I’ve just uploaded to the arXiv my paper “Global regularity of wave maps VI.  Abstract theory of minimal-energy blowup solutions“, to be submitted with the rest of the “heatwave” project to establish global regularity (and scattering) for energy-critical wave maps into hyperbolic space.  Initially, this paper was intended to cap off the project by showing that if global regularity failed, then a special minimal energy blowup solution must exist, which enjoys a certain almost periodicity property modulo the symmetries of the equation.  However, the argument was more technical than I anticipated, and so I am splitting the paper into a relatively short high-level paper (this one) that reduces the problem to four smaller propositions, and a much longer technical paper which establishes those propositions, by developing a substantial amount of perturbation theory for wave maps.  I am pretty sure though that this process will not iterate any further, and paper VII will be my final paper in this series (and which I hope to finish by the end of this summer).  It is also worth noting that a number of papers establishing similar results (though with slightly different hypotheses and conclusions) will shortly appear by Sterbenz-Tataru and Krieger-Schlag.

Almost periodic minimal energy blowup solutions have been constructed for a variety of critical equations, such as the nonlinear Schrodinger equation (NLS) and the nonlinear wave equation (NLW).  The formal definition of almost periodicity is that the orbit of the solution u stays in a precompact subset of the energy space once one quotients out by the non-compact symmetries of the equation (namely, translation and dilation).   Another (more informal) way of saying this is that for every time t, there exists a position x(t) and a frequency N(t) such that the solution u(t) is localised in space in the region \{ x: x = x(t) + O(N(t)^{-1}) \} and in frequency in the region \{ \xi: |\xi| \sim N(t) \}, with the solution decaying in energy away from these regions of space and frequency.  Model examples of almost periodic solutions include traveling waves (in which N(t) is fixed, and x(t) moves at constant velocity) and self-similar solutions (in which x(t) is fixed, and N(t) blows up in finite time at some power law rate).

Intuitively, the reason almost periodic minimal energy blowup solutions ought to exist in the absence of global regularity is as follows.  It is known (for any of the equations mentioned above) that global regularity (and scattering) holds at sufficiently small energies.  Thus, if global regularity fails at high energies, there must exist a critical energy E_{crit}, below which solutions exist globally (and obey scattering bounds), and above which solutions can blow up.

Now consider a solution u at the critical energy which blows up (actually, for technical reasons, we instead consider a sequence of solutions approaching this critical energy which come increasingly close to blowing up, but let’s ignore this for now).  We claim that this solution must be localised in both space and frequency at every time, thus giving the desired almost periodic minimal energy blowup solution.  Indeed, suppose u(t) is not localised in frequency at some time t; then one can decompose u(t) into a high frequency component u_{hi}(t) and a low frequency component u_{lo}(t), both of which have strictly smaller energy than E_{crit}, and which are widely separated from each other in frequency space.  By hypothesis, each of u_{hi} and u_{lo} can then be extended to global solutions, which should remain widely separated in frequency (because the linear analogues of these equations are constant-coefficient and thus preserve frequency support).   Assuming that interactions between very high and very low frequencies are negligible, this implies that the superposition u_{hi}+u_{lo} approximately obeys the nonlinear equation; with a suitable perturbation theory, this implies that u_{hi}+u_{lo} is close to u.  But then u is not blowing up, a contradiction.   The situation with spatial localisation is similar, but is somewhat more complicated due to the fact that spatial support, in contrast to frequency support, is not preserved by the linear evolution, let alone the nonlinear evolution.

As mentioned before, this type of scheme has been successfully implemented on a number of equations such as NLS and NLW.  However, there are two main obstacles in establishing it for wave maps.  The first is that the wave maps equation is not a scalar equation: the unknown field takes values in a target manifold (specifically, in a hyperbolic space) rather than in a Euclidean space.  As a consequence, it is not obvious how one would perform operations such as “decompose the solution into low frequency and high frequency components”, or the inverse operation “superimpose the low frequency and high frequency components to reconstitute the solution”.  Another way of viewing the problem is that the various component fields of the solution have to obey a number of important compatibility conditions which can be disrupted by an overly simple-minded approach to decomposition or reconstitution of solutions.

The second problem is that the interaction between very high and very low frequencies for wave maps turns out to not be entirely negligible: the high frequencies do have a negligible impact on the evolution of the low frequencies, but the low frequencies can “rotate” the high frequencies by acting as a sort of magnetic field (or more precisely, a connection) for the evolution of those high frequencies.  So the combined evolution of the high and low frequencies is not well approximated by a naive superposition of the separate evolutions of these frequency components.

Read the rest of this entry »

This is a continuation of the previous thread here in the polymath1 project, which is now full.  Ostensibly, the purpose of this thread is to continue writing up the paper containing many of the things achieved during this side of the project, though we have also been spending time on chasing down more results, in particular using new computer data to narrow down the range of the maximal size of  6D Moser sets (currently we can pin this down to between 353 and 355).   At some point we have to decide what results to put in in full detail in the paper, what results to summarise only (with links to the wiki), and what results to defer to perhaps a subsequent paper, but these decisions can be taken at a leisurely pace.

I guess we’ve abandoned the numbering system now, but I suppose that if necessary we can use timestamps or URLs to link to previous comments.

The celebrated Szemerédi-Trotter theorem gives a bound for the set of incidences {I(P,L) := \{ (p,\ell) \in P \times L: p \in \ell \}} between a finite set of points {P} and a finite set of lines {L} in the Euclidean plane {{\bf R}^2}. Specifically, the bound is

\displaystyle  |I(P,L)| \ll |P|^{2/3} |L|^{2/3} + |P| + |L| \ \ \ \ \ (1)

where we use the asymptotic notation {X \ll Y} or {X=O(Y)} to denote the statement that {X \leq CY} for some absolute constant {C}. In particular, the number of incidences between {n} points and {n} lines is {O(n^{4/3})}. This bound is sharp; consider for instance the discrete box {P := \{ (a,b) \in {\bf Z}^2: 1 \leq a \leq N; 1 \leq b \leq 2N^2 \}} with {L} being the collection of lines {\{ (x,mx+b): m, b \in {\bf Z}, 1 \leq m \leq N, 1 \leq b \leq N^2 \}}. One easily verifies that {|P|=2N^3}, {|L| = N^3}, and {|I(P,L)| = N^4}, showing that (1) is essentially sharp in the case {|P| \sim |L|}; one can concoct similar examples for other regimes of {|P|} and {|L|}.

On the other hand, if one replaces the Euclidean plane {{\bf R}^2} by a finite field geometry {F^2}, where {F} is a finite field, then the estimate (1) is false. For instance, if {P} is the entire plane {F^2}, and {L} is the set of all lines in {F^2}, then {|P|, |L|} are both comparable to {|F|^2}, but {|I(P,L)|} is comparable to {|F|^3}, thus violating (1) when {|F|} is large. Thus any proof of the Szemerédi-Trotter theorem must use a special property of the Euclidean plane which is not enjoyed by finite field geometries. In particular, this strongly suggests that one cannot rely purely on algebra and combinatorics to prove (1); one must also use some Euclidean geometry or topology as well.

Nowadays, the slickest proof of the Szemerédi-Trotter theorem is via the crossing number inequality (as discussed in this previous post), which ultimately relies on Euler’s famous formula {|V|-|E|+|F|=2}; thus in this argument it is topology which is the feature of Euclidean space which one is exploiting, and which is not present in the finite field setting. Today, though, I would like to mention a different proof (closer in spirit to the original proof of Szemerédi-Trotter, and also a later argument of Clarkson et al.), based on the method of cell decomposition, which has proven to be a very flexible method in combinatorial incidence geometry. Here, the distinctive feature of Euclidean geometry one is exploiting is convexity, which again has no finite field analogue.

Roughly speaking, the idea is this. Using nothing more than the axiom that two points determine at most one line, one can obtain the bound

\displaystyle  |I(P,L)| \ll |P| |L|^{1/2} + |L|, \ \ \ \ \ (2)

which is inferior to (1). (On the other hand, this estimate works in both Euclidean and finite field geometries, and is sharp in the latter case, as shown by the example given earlier.) Dually, the axiom that two lines determine at most one point gives the bound

\displaystyle  |I(P,L)| \ll |L| |P|^{1/2} + |P| \ \ \ \ \ (3)

(or alternatively, one can use projective duality to interchange points and lines and deduce (3) from (2)).

An inspection of the proof of (2) shows that it is only expected to be sharp when the bushes {L_p := \{ \ell \in L: \ell \ni p \}} associated to each point {p \in P} behave like “independent” subsets of {L}, so that there is no significant correlation between the bush {L_p} of one point and the bush of another point {L_q}.

However, in Euclidean space, we have the phenomenon that the bush of a point {L_p} is influenced by the region of space that {p} lies in. Clearly, if {p} lies in a set {\Omega} (e.g. a convex polygon), then the only lines {\ell \in L} that can contribute to {L_p} are those lines which pass through {\Omega}. If {\Omega} is a small convex region of space, one expects only a fraction of the lines in {L} to actually pass through {\Omega}. As such, if {p} and {q} both lie in {\Omega}, then {L_p} and {L_q} are compressed inside a smaller subset of {L}, namely the set of lines passing through {\Omega}, and so should be more likely to intersect than if they were independent. This should lead to an improvement to (2) (and indeed, as we shall see below, ultimately leads to (1)).

More formally, the argument proceeds by applying the following lemma:

Lemma 1 (Cell decomposition) Let {L} be a finite collection of lines in {{\bf R}^2}, let {P} be a finite set of points, and let {r \geq 1}. Then it is possible to find a set {R} of {O(r)} lines in {L}, plus some additional open line segments not containing any point in {P}, which subdivide {{\bf R}^2} into {O(r^2)} convex regions (or cells), such that the interior of each such cell is incident to at most {O(|L|/r)} lines.

The deduction of (1) from (2), (3) and Lemma 1 is very quick. Firstly we may assume we are in the range

\displaystyle  |L|^{1/2} \ll |P| \ll |L|^2 \ \ \ \ \ (4)

otherwise the bound (1) follows already from either (2) or (3) and some high-school algebra.

Let {r \geq 1} be a parameter to be optimised later. We apply the cell decomposition to subdivide {{\bf R}^2} into {O(r^2)} open convex regions, plus a family {R} of {O(r)} lines. Each of the {O(r^2)} convex regions {\Omega} has only {O(|L|/r)} lines through it, and so by (2) contributes {O( |P \cap \Omega| |L|^{1/2}/r^{1/2} + |L| / r )} incidences. Meanwhile, on each of the lines {\ell} in {R} used to perform this decomposition, there are at most {|L|} transverse incidences (because each line in {L} distinct from {\ell} can intersect {\ell} at most once), plus all the incidences along {\ell} itself. Putting all this together, one obtains

\displaystyle  |I(P,L)| \leq |I(P,L \cap R)| + O( |P| |L|^{1/2}/r^{1/2} + |L| r).

We optimise this by selecting {r \sim |P|^{2/3} / |L|^{1/3}}; from (4) we can ensure that {r \leq |L|/2}, so that {|L \cap R| \leq |L|/2}. One then obtains

\displaystyle  |I(P,L)| \leq |I(P,L \cap R)| + O( |P|^{2/3} |L|^{2/3} ).

We can iterate away the {L \cap R} error (halving the number of lines each time) and sum the resulting geometric series to obtain (1).

It remains to prove (1). If one subdivides {{\bf R}^2} using {r} arbitrary lines, one creates at most {O(r^2)} cells (because each new line intersects the existing lines at most once, and so can create at most {O(r)} distinct cells), and for a similar reason, every line in {L} visits at most {r} of these regions, and so by double counting one expects {O(|L|/r)} lines per cell “on the average”. The key difficulty is then to get {O(|L|/r)} lines through every cell, not just on the average. It turns out that a probabilistic argument will almost work, but with a logarithmic loss (thus having {O( |L| \log |L| / r )} lines per cell rather than {O(|L|/r)}); but with a little more work one can then iterate away this loss also. The arguments here are loosely based on those of Clarkson et al.; a related (deterministic) decomposition also appears in the original paper of Szemerédi and Trotter. But I wish to focus here on the probabilistic approach.)

It is also worth noting that the original (somewhat complicated) argument of Szemerédi-Trotter has been adapted to establish the analogue of (1) in the complex plane {{\bf C}^2} by Toth, while the other known proofs of Szemerédi-Trotter, so far, have not been able to be extended to this setting (the Euler characteristic argument clearly breaks down, as does any proof based on using lines to divide planes into half-spaces). So all three proofs have their advantages and disadvantages.

Read the rest of this entry »

In the theory of discrete random matrices (e.g. matrices whose entries are random signs {\pm 1}), one often encounters the problem of understanding the distribution of the random variable {\hbox{dist}(X,V)}, where {X = (x_1,\ldots,x_n) \in \{-1,+1\}^n} is an {n}-dimensional random sign vector (so {X} is uniformly distributed in the discrete cube {\{-1,+1\}^n}), and {V} is some {d}-dimensional subspace of {{\bf R}^n} for some {0 \leq d \leq n}.

It is not hard to compute the second moment of this random variable. Indeed, if {P = (p_{ij})_{1 \leq i,j \leq n}} denotes the orthogonal projection matrix from {{\bf R}^n} to the orthogonal complement {V^\perp} of {V}, then one observes that

\displaystyle  \hbox{dist}(X,V)^2 = X \cdot P X = \sum_{i=1}^n \sum_{j=1}^n x_i x_j p_{ij}

and so upon taking expectations we see that

\displaystyle  {\Bbb E} \hbox{dist}(X,V)^2 = \sum_{i=1}^n p_{ii} = \hbox{tr} P = n-d \ \ \ \ \ (1)

since {P} is a rank {n-d} orthogonal projection. So we expect {\hbox{dist}(X,V)} to be about {\sqrt{n-d}} on the average.

In fact, one has sharp concentration around this value, in the sense that {\hbox{dist}(X,V) = \sqrt{n-d}+O(1)} with high probability. More precisely, we have

Proposition 1 (Large deviation inequality) For any {t>0}, one has

\displaystyle  {\Bbb P}( |\hbox{dist}(X,V) - \sqrt{n-d}| \geq t ) \leq C \exp(- c t^2 )

for some absolute constants {C, c > 0}.

In fact the constants {C, c} are very civilised; for large {t} one can basically take {C=4} and {c=1/16}, for instance. This type of concentration, particularly for subspaces {V} of moderately large codimension {n-d}, is fundamental to much of my work on random matrices with Van Vu, starting with our first paper (in which this proposition first appears). (For subspaces of small codimension (such as hyperplanes) one has to use other tools to get good results, such as inverse Littlewood-Offord theory or the Berry-Esséen central limit theorem, but that is another story.)

Proposition 1 is an easy consequence of the second moment computation and Talagrand’s inequality, which among other things provides a sharp concentration result for convex Lipschitz functions on the cube {\{-1,+1\}^n}; since {\hbox{dist}(x,V)} is indeed a convex Lipschitz function, this inequality can be applied immediately. The proof of Talagrand’s inequality is short and can be found in several textbooks (e.g. Alon and Spencer), but I thought I would reproduce the argument here (specialised to the convex case), mostly to force myself to learn the proof properly. Note the concentration of {O(1)} obtained by Talagrand’s inequality is much stronger than what one would get from more elementary tools such as Azuma’s inequality or McDiarmid’s inequality, which would only give concentration of about {O(\sqrt{n})} or so (which is in fact trivial, since the cube {\{-1,+1\}^n} has diameter {2\sqrt{n}}); the point is that Talagrand’s inequality is very effective at exploiting the convexity of the problem, as well as the Lipschitz nature of the function in all directions, whereas Azuma’s inequality can only easily take advantage of the Lipschitz nature of the function in coordinate directions. On the other hand, Azuma’s inequality works just as well if the {\ell^2} metric is replaced with the larger {\ell^1} metric, and one can conclude that the {\ell^1} distance between {X} and {V} concentrates around its median to a width {O(\sqrt{n})}, which is a more non-trivial fact than the {\ell^2} concentration bound given by that inequality. (The computation of the median of the {\ell^1} distance is more complicated than for the {\ell^2} distance, though, and depends on the orientation of {V}.)

Remark 1 If one makes the coordinates of {X} iid Gaussian variables {x_i \equiv N(0,1)} rather than random signs, then Proposition 1 is much easier to prove; the probability distribution of a Gaussian vector is rotation-invariant, so one can rotate {V} to be, say, {{\bf R}^d}, at which point {\hbox{dist}(X,V)^2} is clearly the sum of {n-d} independent squares of Gaussians (i.e. a chi-square distribution), and the claim follows from direct computation (or one can use the Chernoff inequality). The gaussian counterpart of Talagrand’s inequality is more classical, being essentially due to Lévy, and will also be discussed later in this post.

Read the rest of this entry »