You are currently browsing the tag archive for the ‘Jean Bourgain’ tag.

I have uploaded to the arXiv my paper “Exploring the toolkit of Jean Bourgain“. This is one of a collection of papers to be published in the Bulletin of the American Mathematical Society describing aspects of the work of Jean Bourgain; other contributors to this collection include Keith Ball, Ciprian Demeter, and Carlos Kenig. Because the other contributors will be covering specific areas of Jean’s work in some detail, I decided to take a non-overlapping tack, and focus instead on some basic tools of Jean that he frequently used across many of the fields he contributed to. Jean had a surprising number of these “basic tools” that he wielded with great dexterity, and in this paper I focus on just a few of them:

  • Reducing qualitative analysis results (e.g., convergence theorems or dimension bounds) to quantitative analysis estimates (e.g., variational inequalities or maximal function estimates).
  • Using dyadic pigeonholing to locate good scales to work in or to apply truncations.
  • Using random translations to amplify small sets (low density) into large sets (positive density).
  • Combining large deviation inequalities with metric entropy bounds to control suprema of various random processes.

Each of these techniques is individually not too difficult to explain, and were certainly employed on occasion by various mathematicians prior to Bourgain’s work; but Jean had internalized them to the point where he would instinctively use them as soon as they became relevant to a given problem at hand. I illustrate this at the end of the paper with an exposition of one particular result of Jean, on the Erdős similarity problem, in which his main result (that any sum {S = S_1+S_2+S_3} of three infinite sets of reals has the property that there exists a positive measure set {E} that does not contain any homothetic copy {x+tS} of {S}) is basically proven by a sequential application of these tools (except for dyadic pigeonholing, which turns out not to be needed here).

I had initially intended to also cover some other basic tools in Jean’s toolkit, such as the uncertainty principle and the use of probabilistic decoupling, but was having trouble keeping the paper coherent with such a broad focus (certainly I could not identify a single paper of Jean’s that employed all of these tools at once). I hope though that the examples given in the paper gives some reasonable impression of Jean’s research style.

Given any finite collection of elements {(f_i)_{i \in I}} in some Banach space {X}, the triangle inequality tells us that

\displaystyle \| \sum_{i \in I} f_i \|_X \leq \sum_{i \in I} \|f_i\|_X.

However, when the {f_i} all “oscillate in different ways”, one expects to improve substantially upon the triangle inequality. For instance, if {X} is a Hilbert space and the {f_i} are mutually orthogonal, we have the Pythagorean theorem

\displaystyle \| \sum_{i \in I} f_i \|_X = (\sum_{i \in I} \|f_i\|_X^2)^{1/2}.

For sake of comparison, from the triangle inequality and Cauchy-Schwarz one has the general inequality

\displaystyle \| \sum_{i \in I} f_i \|_X \leq (\# I)^{1/2} (\sum_{i \in I} \|f_i\|_X^2)^{1/2} \ \ \ \ \ (1)


for any finite collection {(f_i)_{i \in I}} in any Banach space {X}, where {\# I} denotes the cardinality of {I}. Thus orthogonality in a Hilbert space yields “square root cancellation”, saving a factor of {(\# I)^{1/2}} or so over the trivial bound coming from the triangle inequality.

More generally, let us somewhat informally say that a collection {(f_i)_{i \in I}} exhibits decoupling in {X} if one has the Pythagorean-like inequality

\displaystyle \| \sum_{i \in I} f_i \|_X \ll_\varepsilon (\# I)^\varepsilon (\sum_{i \in I} \|f_i\|_X^2)^{1/2}

for any {\varepsilon>0}, thus one obtains almost the full square root cancellation in the {X} norm. The theory of almost orthogonality can then be viewed as the theory of decoupling in Hilbert spaces such as {L^2({\bf R}^n)}. In {L^p} spaces for {p < 2} one usually does not expect this sort of decoupling; for instance, if the {f_i} are disjointly supported one has

\displaystyle \| \sum_{i \in I} f_i \|_{L^p} = (\sum_{i \in I} \|f_i\|_{L^p}^p)^{1/p}

and the right-hand side can be much larger than {(\sum_{i \in I} \|f_i\|_{L^p}^2)^{1/2}} when {p < 2}. At the opposite extreme, one usually does not expect to get decoupling in {L^\infty}, since one could conceivably align the {f_i} to all attain a maximum magnitude at the same location with the same phase, at which point the triangle inequality in {L^\infty} becomes sharp.

However, in some cases one can get decoupling for certain {2 < p < \infty}. For instance, suppose we are in {L^4}, and that {f_1,\dots,f_N} are bi-orthogonal in the sense that the products {f_i f_j} for {1 \leq i < j \leq N} are pairwise orthogonal in {L^2}. Then we have

\displaystyle \| \sum_{i = 1}^N f_i \|_{L^4}^2 = \| (\sum_{i=1}^N f_i)^2 \|_{L^2}

\displaystyle = \| \sum_{1 \leq i,j \leq N} f_i f_j \|_{L^2}

\displaystyle \ll (\sum_{1 \leq i,j \leq N} \|f_i f_j \|_{L^2}^2)^{1/2}

\displaystyle = \| (\sum_{1 \leq i,j \leq N} |f_i f_j|^2)^{1/2} \|_{L^2}

\displaystyle = \| \sum_{i=1}^N |f_i|^2 \|_{L^2}

\displaystyle \leq \sum_{i=1}^N \| |f_i|^2 \|_{L^2}

\displaystyle = \sum_{i=1}^N \|f_i\|_{L^4}^2

giving decoupling in {L^4}. (Similarly if each of the {f_i f_j} is orthogonal to all but {O_\varepsilon( N^\varepsilon )} of the other {f_{i'} f_{j'}}.) A similar argument also gives {L^6} decoupling when one has tri-orthogonality (with the {f_i f_j f_k} mostly orthogonal to each other), and so forth. As a slight variant, Khintchine’s inequality also indicates that decoupling should occur for any fixed {2 < p < \infty} if one multiplies each of the {f_i} by an independent random sign {\epsilon_i \in \{-1,+1\}}.

In recent years, Bourgain and Demeter have been establishing decoupling theorems in {L^p({\bf R}^n)} spaces for various key exponents of {2 < p < \infty}, in the “restriction theory” setting in which the {f_i} are Fourier transforms of measures supported on different portions of a given surface or curve; this builds upon the earlier decoupling theorems of Wolff. In a recent paper with Guth, they established the following decoupling theorem for the curve {\gamma({\bf R}) \subset {\bf R}^n} parameterised by the polynomial curve

\displaystyle \gamma: t \mapsto (t, t^2, \dots, t^n).

For any ball {B = B(x_0,r)} in {{\bf R}^n}, let {w_B: {\bf R}^n \rightarrow {\bf R}^+} denote the weight

\displaystyle w_B(x) := \frac{1}{(1 + \frac{|x-x_0|}{r})^{100n}},

which should be viewed as a smoothed out version of the indicator function {1_B} of {B}. In particular, the space {L^p(w_B) = L^p({\bf R}^n, w_B(x)\ dx)} can be viewed as a smoothed out version of the space {L^p(B)}. For future reference we observe a fundamental self-similarity of the curve {\gamma({\bf R})}: any arc {\gamma(I)} in this curve, with {I} a compact interval, is affinely equivalent to the standard arc {\gamma([0,1])}.

Theorem 1 (Decoupling theorem) Let {n \geq 1}. Subdivide the unit interval {[0,1]} into {N} equal subintervals {I_i} of length {1/N}, and for each such {I_i}, let {f_i: {\bf R}^n \rightarrow {\bf R}} be the Fourier transform

\displaystyle f_i(x) = \int_{\gamma(I_i)} e(x \cdot \xi)\ d\mu_i(\xi)

of a finite Borel measure {\mu_i} on the arc {\gamma(I_i)}, where {e(\theta) := e^{2\pi i \theta}}. Then the {f_i} exhibit decoupling in {L^{n(n+1)}(w_B)} for any ball {B} of radius {N^n}.

Orthogonality gives the {n=1} case of this theorem. The bi-orthogonality type arguments sketched earlier only give decoupling in {L^p} up to the range {2 \leq p \leq 2n}; the point here is that we can now get a much larger value of {n}. The {n=2} case of this theorem was previously established by Bourgain and Demeter (who obtained in fact an analogous theorem for any curved hypersurface). The exponent {n(n+1)} (and the radius {N^n}) is best possible, as can be seen by the following basic example. If

\displaystyle f_i(x) := \int_{I_i} e(x \cdot \gamma(\xi)) g_i(\xi)\ d\xi

where {g_i} is a bump function adapted to {I_i}, then standard Fourier-analytic computations show that {f_i} will be comparable to {1/N} on a rectangular box of dimensions {N \times N^2 \times \dots \times N^n} (and thus volume {N^{n(n+1)/2}}) centred at the origin, and exhibit decay away from this box, with {\|f_i\|_{L^{n(n+1)}(w_B)}} comparable to

\displaystyle 1/N \times (N^{n(n+1)/2})^{1/(n(n+1))} = 1/\sqrt{N}.

On the other hand, {\sum_{i=1}^N f_i} is comparable to {1} on a ball of radius comparable to {1} centred at the origin, so {\|\sum_{i=1}^N f_i\|_{L^{n(n+1)}(w_B)}} is {\gg 1}, which is just barely consistent with decoupling. This calculation shows that decoupling will fail if {n(n+1)} is replaced by any larger exponent, and also if the radius of the ball {B} is reduced to be significantly smaller than {N^n}.

This theorem has the following consequence of importance in analytic number theory:

Corollary 2 (Vinogradov main conjecture) Let {s, n, N \geq 1} be integers, and let {\varepsilon > 0}. Then

\displaystyle \int_{[0,1]^n} |\sum_{j=1}^N e( j x_1 + j^2 x_2 + \dots + j^n x_n)|^{2s}\ dx_1 \dots dx_n

\displaystyle \ll_{\varepsilon,s,n} N^{s+\varepsilon} + N^{2s - \frac{n(n+1)}{2}+\varepsilon}.

Proof: By the Hölder inequality (and the trivial bound of {N} for the exponential sum), it suffices to treat the critical case {s = n(n+1)/2}, that is to say to show that

\displaystyle \int_{[0,1]^n} |\sum_{j=1}^N e( j x_1 + j^2 x_2 + \dots + j^n x_n)|^{n(n+1)}\ dx_1 \dots dx_n \ll_{\varepsilon,n} N^{\frac{n(n+1)}{2}+\varepsilon}.

We can rescale this as

\displaystyle \int_{[0,N] \times [0,N^2] \times \dots \times [0,N^n]} |\sum_{j=1}^N e( x \cdot \gamma(j/N) )|^{n(n+1)}\ dx \ll_{\varepsilon,n} N^{n(n+1)+\varepsilon}.

As the integrand is periodic along the lattice {N{\bf Z} \times N^2 {\bf Z} \times \dots \times N^n {\bf Z}}, this is equivalent to

\displaystyle \int_{[0,N^n]^n} |\sum_{j=1}^N e( x \cdot \gamma(j/N) )|^{n(n+1)}\ dx \ll_{\varepsilon,n} N^{\frac{n(n+1)}{2}+n^2+\varepsilon}.

The left-hand side may be bounded by {\ll \| \sum_{j=1}^N f_j \|_{L^{n(n+1)}(w_B)}^{n(n+1)}}, where {B := B(0,N^n)} and {f_j(x) := e(x \cdot \gamma(j/N))}. Since

\displaystyle \| f_j \|_{L^{n(n+1)}(w_B)} \ll (N^{n^2})^{\frac{1}{n(n+1)}},

the claim now follows from the decoupling theorem and a brief calculation. \Box

Using the Plancherel formula, one may equivalently (when {s} is an integer) write the Vinogradov main conjecture in terms of solutions {j_1,\dots,j_s,k_1,\dots,k_s \in \{1,\dots,N\}} to the system of equations

\displaystyle j_1^i + \dots + j_s^i = k_1^i + \dots + k_s^i \forall i=1,\dots,n,

but we will not use this formulation here.

A history of the Vinogradov main conjecture may be found in this survey of Wooley; prior to the Bourgain-Demeter-Guth theorem, the conjecture was solved completely for {n \leq 3}, or for {n > 3} and {s} either below {n(n+1)/2 - n/3 + O(n^{2/3})} or above {n(n-1)}, with the bulk of recent progress coming from the efficient congruencing technique of Wooley. It has numerous applications to exponential sums, Waring’s problem, and the zeta function; to give just one application, the main conjecture implies the predicted asymptotic for the number of ways to express a large number as the sum of {23} fifth powers (the previous best result required {28} fifth powers). The Bourgain-Demeter-Guth approach to the Vinogradov main conjecture, based on decoupling, is ostensibly very different from the efficient congruencing technique, which relies heavily on the arithmetic structure of the program, but it appears (as I have been told from second-hand sources) that the two methods are actually closely related, with the former being a sort of “Archimedean” version of the latter (with the intervals {I_i} in the decoupling theorem being analogous to congruence classes in the efficient congruencing method); hopefully there will be some future work making this connection more precise. One advantage of the decoupling approach is that it generalises to non-arithmetic settings in which the set {\{1,\dots,N\}} that {j} is drawn from is replaced by some other similarly separated set of real numbers. (A random thought – could this allow the Vinogradov-Korobov bounds on the zeta function to extend to Beurling zeta functions?)

Below the fold we sketch the Bourgain-Demeter-Guth argument proving Theorem 1.

I thank Jean Bourgain and Andrew Granville for helpful discussions.

Read the rest of this entry »

One of the basic problems in analytic number theory is to estimate sums of the form

\displaystyle  \sum_{p<x} f(p)

as {x \rightarrow \infty}, where {p} ranges over primes and {f} is some explicit function of interest (e.g. a linear phase function {f(p) = e^{2\pi i \alpha p}} for some real number {\alpha}). This is essentially the same task as obtaining estimates on the sum

\displaystyle  \sum_{n<x} \Lambda(n) f(n)

where {\Lambda} is the von Mangoldt function. If {f} is bounded, {f(n)=O(1)}, then from the prime number theorem one has the trivial bound

\displaystyle  \sum_{n<x} \Lambda(n) f(n) = O(x)

but often (when {f} is somehow “oscillatory” in nature) one is seeking the refinement

\displaystyle  \sum_{n<x} \Lambda(n) f(n) = o(x) \ \ \ \ \ (1)

or equivalently

\displaystyle  \sum_{p<x} f(p) = o(\frac{x}{\log x}). \ \ \ \ \ (2)

Thanks to identities such as

\displaystyle  \Lambda(n) = \sum_{d|n} \mu(d) \log(\frac{n}{d}), \ \ \ \ \ (3)

where {\mu} is the Möbius function, refinements such as (1) are similar in spirit to estimates of the form

\displaystyle  \sum_{n<x} \mu(n) f(n) = o(x). \ \ \ \ \ (4)

Unfortunately, the connection between (1) and (4) is not particularly tight; roughly speaking, one needs to improve the bounds in (4) (and variants thereof) by about two factors of {\log x} before one can use identities such as (3) to recover (1). Still, one generally thinks of (1) and (4) as being “morally” equivalent, even if they are not formally equivalent.

When {f} is oscillating in a sufficiently “irrational” way, then one standard way to proceed is the method of Type I and Type II sums, which uses truncated versions of divisor identities such as (3) to expand out either (1) or (4) into linear (Type I) or bilinear sums (Type II) with which one can exploit the oscillation of {f}. For instance, Vaughan’s identity lets one rewrite the sum in (1) as the sum of the Type I sum

\displaystyle  \sum_{d \leq U} \mu(d) (\sum_{V/d \leq r \leq x/d} (\log r) f(rd)),

the Type I sum

\displaystyle  -\sum_{d \leq UV} a(d) \sum_{V/d \leq r \leq x/d} f(rd),

the Type II sum

\displaystyle  -\sum_{V \leq d \leq x/U} \sum_{U < m \leq x/V} \Lambda(d) b(m) f(dm),

and the error term {\sum_{d \leq V} \Lambda(n) f(n)}, whenever {1 \leq U, V \leq x} are parameters, and {a, b} are the sequences

\displaystyle  a(d) := \sum_{e \leq U, f \leq V: ef = d} \Lambda(d) \mu(e)


\displaystyle  b(m) := \sum_{d|m: d \leq U} \mu(d).

Similarly one can express (4) as the Type I sum

\displaystyle  -\sum_{d \leq UV} c(d) \sum_{UV/d \leq r \leq x/d} f(rd),

the Type II sum

\displaystyle  - \sum_{V < d \leq x/U} \sum_{U < m \leq x/d} \mu(m) b(d) f(dm)

and the error term {\sum_{d \leq UV} \mu(n) f(N)}, whenever {1 \leq U,V \leq x} with {UV \leq x}, and {c} is the sequence

\displaystyle  c(d) := \sum_{e \leq U, f \leq V: ef = d} \mu(d) \mu(e).

After eliminating troublesome sequences such as {a(), b(), c()} via Cauchy-Schwarz or the triangle inequality, one is then faced with the task of estimating Type I sums such as

\displaystyle  \sum_{r \leq y} f(rd)

or Type II sums such as

\displaystyle  \sum_{r \leq y} f(rd) \overline{f(rd')}

for various {y, d, d' \geq 1}. Here, the trivial bound is {O(y)}, but due to a number of logarithmic inefficiencies in the above method, one has to obtain bounds that are more like {O( \frac{y}{\log^C y})} for some constant {C} (e.g. {C=5}) in order to end up with an asymptotic such as (1) or (4).

However, in a recent paper of Bourgain, Sarnak, and Ziegler, it was observed that as long as one is only seeking the Mobius orthogonality (4) rather than the von Mangoldt orthogonality (1), one can avoid losing any logarithmic factors, and rely purely on qualitative equidistribution properties of {f}. A special case of their orthogonality criterion (which actually dates back to an earlier paper of Katai, as was pointed out to me by Nikos Frantzikinakis) is as follows:

Proposition 1 (Orthogonality criterion) Let {f: {\bf N} \rightarrow {\bf C}} be a bounded function such that

\displaystyle  \sum_{n \leq x} f(pn) \overline{f(qn)} = o(x) \ \ \ \ \ (5)

for any distinct primes {p, q} (where the decay rate of the error term {o(x)} may depend on {p} and {q}). Then

\displaystyle  \sum_{n \leq x} \mu(n) f(n) =o(x). \ \ \ \ \ (6)

Actually, the Bourgain-Sarnak-Ziegler paper establishes a more quantitative version of this proposition, in which {\mu} can be replaced by an arbitrary bounded multiplicative function, but we will content ourselves with the above weaker special case. (See also these notes of Harper, which uses the Katai argument to give a slightly weaker quantitative bound in the same spirit.) This criterion can be viewed as a multiplicative variant of the classical van der Corput lemma, which in our notation asserts that {\sum_{n \leq x} f(n) = o(x)} if one has {\sum_{n \leq x} f(n+h) \overline{f(n)} = o(x)} for each fixed non-zero {h}.

As a sample application, Proposition 1 easily gives a proof of the asymptotic

\displaystyle  \sum_{n \leq x} \mu(n) e^{2\pi i \alpha n} = o(x)

for any irrational {\alpha}. (For rational {\alpha}, this is a little trickier, as it is basically equivalent to the prime number theorem in arithmetic progressions.) The paper of Bourgain, Sarnak, and Ziegler also apply this criterion to nilsequences (obtaining a quick proof of a qualitative version of a result of Ben Green and myself, see these notes of Ziegler for details) and to horocycle flows (for which no Möbius orthogonality result was previously known).

Informally, the connection between (5) and (6) comes from the multiplicative nature of the Möbius function. If (6) failed, then {\mu(n)} exhibits strong correlation with {f(n)}; by change of variables, we then expect {\mu(pn)} to correlate with {f(pn)} and {\mu(pm)} to correlate with {f(qn)}, for “typical” {p,q} at least. On the other hand, since {\mu} is multiplicative, {\mu(pn)} exhibits strong correlation with {\mu(qn)}. Putting all this together (and pretending correlation is transitive), this would give the claim (in the contrapositive). Of course, correlation is not quite transitive, but it turns out that one can use the Cauchy-Schwarz inequality as a substitute for transitivity of correlation in this case.

I will give a proof of Proposition 1 below the fold (which is not quite based on the argument in the above mentioned paper, but on a variant of that argument communicated to me by Tamar Ziegler, and also independently discovered by Adam Harper). The main idea is to exploit the following observation: if {P} is a “large” but finite set of primes (in the sense that the sum {A := \sum_{p \in P} \frac{1}{p}} is large), then for a typical large number {n} (much larger than the elements of {P}), the number of primes in {P} that divide {n} is pretty close to {A = \sum_{p \in P} \frac{1}{p}}:

\displaystyle  \sum_{p \in P: p|n} 1 \approx A. \ \ \ \ \ (7)

A more precise formalisation of this heuristic is provided by the Turan-Kubilius inequality, which is proven by a simple application of the second moment method.

In particular, one can sum (7) against {\mu(n) f(n)} and obtain an approximation

\displaystyle  \sum_{n \leq x} \mu(n) f(n) \approx \frac{1}{A} \sum_{p \in P} \sum_{n \leq x: p|n} \mu(n) f(n)

that approximates a sum of {\mu(n) f(n)} by a bunch of sparser sums of {\mu(n) f(n)}. Since

\displaystyle  x = \frac{1}{A} \sum_{p \in P} \frac{x}{p},

we see (heuristically, at least) that in order to establish (4), it would suffice to establish the sparser estimates

\displaystyle  \sum_{n \leq x: p|n} \mu(n) f(n) = o(\frac{x}{p})

for all {p \in P} (or at least for “most” {p \in P}).

Now we make the change of variables {n = pm}. As the Möbius function is multiplicative, we usually have {\mu(n) = \mu(p) \mu(m) = - \mu(m)}. (There is an exception when {n} is divisible by {p^2}, but this will be a rare event and we will be able to ignore it.) So it should suffice to show that

\displaystyle  \sum_{m \leq x/p} \mu(m) f(pm) = o( x/p )

for most {p \in P}. However, by the hypothesis (5), the sequences {m \mapsto f(pm)} are asymptotically orthogonal as {p} varies, and this claim will then follow from a Cauchy-Schwarz argument.

Read the rest of this entry »

One of my favourite unsolved problems in harmonic analysis is the restriction problem. This problem, first posed explicitly by Elias Stein, can take many equivalent forms, but one of them is this: one starts with a smooth compact hypersurface {S} (possibly with boundary) in {{\bf R}^d}, such as the unit sphere {S = S^2} in {{\bf R}^3}, and equips it with surface measure {d\sigma}. One then takes a bounded measurable function {f \in L^\infty(S,d\sigma)} on this surface, and then computes the (inverse) Fourier transform

\displaystyle  \widehat{fd\sigma}(x) = \int_S e^{2\pi i x \cdot \omega} f(\omega) d\sigma(\omega)

of the measure {fd\sigma}. As {f} is bounded and {d\sigma} is a finite measure, this is a bounded function on {{\bf R}^d}; from the dominated convergence theorem, it is also continuous. The restriction problem asks whether this Fourier transform also decays in space, and specifically whether {\widehat{fd\sigma}} lies in {L^q({\bf R}^d)} for some {q < \infty}. (This is a natural space to control decay because it is translation invariant, which is compatible on the frequency space side with the modulation invariance of {L^\infty(S,d\sigma)}.) By the closed graph theorem, this is the case if and only if there is an estimate of the form

\displaystyle  \| \widehat{f d\sigma} \|_{L^q({\bf R}^d)} \leq C_{q,d,S} \|f\|_{L^\infty(S,d\sigma)} \ \ \ \ \ (1)

for some constant {C_{q,d,S}} that can depend on {q,d,S} but not on {f}. By a limiting argument, to provide such an estimate, it suffices to prove such an estimate under the additional assumption that {f} is smooth.

Strictly speaking, the above problem should be called the extension problem, but it is dual to the original formulation of the restriction problem, which asks to find those exponents {1 \leq q' \leq \infty} for which the Fourier transform of an {L^{q'}({\bf R}^d)} function {g} can be meaningfully restricted to a hypersurface {S}, in the sense that the map {g \mapsto \hat g|_{S}} can be continuously defined from {L^{q'}({\bf R}^d)} to, say, {L^1(S,d\sigma)}. A duality argument shows that the exponents {q'} for which the restriction property holds are the dual exponents to the exponents {q} for which the extension problem holds.

There are several motivations for studying the restriction problem. The problem is connected to the classical question of determining the nature of the convergence of various Fourier summation methods (and specifically, Bochner-Riesz summation); very roughly speaking, if one wishes to perform a partial Fourier transform by restricting the frequencies (possibly using a well-chosen weight) to some region {B} (such as a ball), then one expects this operation to well behaved if the boundary {\partial B} of this region has good restriction (or extension) properties. More generally, the restriction problem for a surface {S} is connected to the behaviour of Fourier multipliers whose symbols are singular at {S}. The problem is also connected to the analysis of various linear PDE such as the Helmholtz equation, Schro\”dinger equation, wave equation, and the (linearised) Korteweg-de Vries equation, because solutions to such equations can be expressed via the Fourier transform in the form {fd\sigma} for various surfaces {S} (the sphere, paraboloid, light cone, and cubic for the Helmholtz, Schrödinger, wave, and linearised Korteweg de Vries equation respectively). A particular family of restriction-type theorems for such surfaces, known as Strichartz estimates, play a foundational role in the nonlinear perturbations of these linear equations (e.g. the nonlinear Schrödinger equation, the nonlinear wave equation, and the Korteweg-de Vries equation). Last, but not least, there is a a fundamental connection between the restriction problem and the Kakeya problem, which roughly speaking concerns how tubes that point in different directions can overlap. Indeed, by superimposing special functions of the type {\widehat{fd\sigma}}, known as wave packets, and which are concentrated on tubes in various directions, one can “encode” the Kakeya problem inside the restriction problem; in particular, the conjectured solution to the restriction problem implies the conjectured solution to the Kakeya problem. Finally, the restriction problem serves as a simplified toy model for studying discrete exponential sums whose coefficients do not have a well controlled phase; this perspective was, for instance, used by Ben Green when he established Roth’s theorem in the primes by Fourier-analytic methods, which was in turn one of the main inspirations for our later work establishing arbitrarily long progressions in the primes, although we ended up using ergodic-theoretic arguments instead of Fourier-analytic ones and so did not directly use restriction theory in that paper.

The estimate (1) is trivial for {q=\infty} and becomes harder for smaller {q}. The geometry, and more precisely the curvature, of the surface {S}, plays a key role: if {S} contains a portion which is completely flat, then it is not difficult to concoct an {f} for which {\widehat{f d\sigma}} fails to decay in the normal direction to this flat portion, and so there are no restriction estimates for any finite {q}. Conversely, if {S} is not infinitely flat at any point, then from the method of stationary phase, the Fourier transform {\widehat{d\sigma}} can be shown to decay at a power rate at infinity, and this together with a standard method known as the {TT^*} argument can be used to give non-trivial restriction estimates for finite {q}. However, these arguments fall somewhat short of obtaining the best possible exponents {q}. For instance, in the case of the sphere {S = S^{d-1} \subset {\bf R}^d}, the Fourier transform {\widehat{d\sigma}(x)} is known to decay at the rate {O(|x|^{-(d-1)/2})} and no better as {d \rightarrow \infty}, which shows that the condition {q > \frac{2d}{d-1}} is necessary in order for (1) to hold for this surface. The restriction conjecture for {S^{d-1}} asserts that this necessary condition is also sufficient. However, the {TT^*}-based argument gives only the Tomas-Stein theorem, which in this context gives (1) in the weaker range {q \geq \frac{2(d+1)}{d-1}}. (On the other hand, by the nature of the {TT^*} method, the Tomas-Stein theorem does allow the {L^\infty(S,d\sigma)} norm on the right-hand side to be relaxed to {L^2(S,d\sigma)}, at which point the Tomas-Stein exponent {\frac{2(d+1)}{d-1}} becomes best possible. The fact that the Tomas-Stein theorem has an {L^2} norm on the right-hand side is particularly valuable for applications to PDE, leading in particular to the Strichartz estimates mentioned earlier.)

Over the last two decades, there was a fair amount of work in pushing past the Tomas-Stein barrier. For sake of concreteness let us work just with the restriction problem for the unit sphere {S^2} in {{\bf R}^3}. Here, the restriction conjecture asserts that (1) holds for all {q > 3}, while the Tomas-Stein theorem gives only {q \geq 4}. By combining a multiscale analysis approach with some new progress on the Kakeya conjecture, Bourgain was able to obtain the first improvement on this range, establishing the restriction conjecture for {q > 4 - \frac{2}{15}}. The methods were steadily refined over the years; until recently, the best result (due to myself) was that the conjecture held for all {q > 3 \frac{1}{3}}, which proceeded by analysing a “bilinear {L^2}” variant of the problem studied previously by Bourgain and by Wolff. This is essentially the limit of that method; the relevant bilinear {L^2} estimate fails for {q < 3 + \frac{1}{3}}. (This estimate was recently established at the endpoint {q=3+\frac{1}{3}} by Jungjin Lee (personal communication), though this does not quite improve the range of exponents in (1) due to a logarithmic inefficiency in converting the bilinear estimate to a linear one.)

On the other hand, the full range {q>3} of exponents in (1) was obtained by Bennett, Carbery, and myself (with an alternate proof later given by Guth), but only under the additional assumption of non-coplanar interactions. In three dimensions, this assumption was enforced by replacing (1) with the weaker trilinear (and localised) variant

\displaystyle  \| \widehat{f_1 d\sigma_1} \widehat{f_2 d\sigma_2} \widehat{f_3 d\sigma_3} \|_{L^{q/3}(B(0,R))} \leq C_{q,d,S_1,S_2,S_3,\epsilon} R^\epsilon \ \ \ \ \ (2)

\displaystyle  \|f_1\|_{L^\infty(S_1,d\sigma_1)} \|f_2\|_{L^\infty(S_2,d\sigma_2)} \|f_3\|_{L^\infty(S_3,d\sigma_3)}

where {\epsilon>0} and {R \geq 1} are arbitrary, {B(0,R)} is the ball of radius {R} in {{\bf R}^3}, and {S_1,S_2,S_3} are compact portions of {S} whose unit normals {n_1(),n_2(),n_3()} are never coplanar, thus there is a uniform lower bound

\displaystyle  |n_1(\omega_1) \wedge n_2(\omega_2) \wedge n_3(\omega_3)| \geq c

for some {c>0} and all {\omega_1 \in S_1, \omega_2 \in S_2, \omega_3 \in S_3}. If it were not for this non-coplanarity restriction, (2) would be equivalent to (1) (by setting {S_1=S_2=S_3} and {f_1=f_2=f_3}, with the converse implication coming from Hölder’s inequality; the {R^\epsilon} loss can be removed by a lemma from a paper of mine). At the time we wrote this paper, we tried fairly hard to try to remove this non-coplanarity restriction in order to recover progress on the original restriction conjecture, but without much success.

A few weeks ago, though, Bourgain and Guth found a new way to use multiscale analysis to “interpolate” between the result of Bennett, Carbery and myself (that has optimal exponents, but requires non-coplanar interactions), with a more classical square function estimate of Córdoba that handles the coplanar case. A direct application of this interpolation method already ties with the previous best known result in three dimensions (i.e. that (1) holds for {q > 3 \frac{1}{3}}). But it also allows for the insertion of additional input, such as the best Kakeya estimate currently known in three dimensions, due to Wolff. This enlarges the range slightly to {q > 3.3}. The method also can extend to variable-coefficient settings, and in some of these cases (where there is so much “compression” going on that no additional Kakeya estimates are available) the estimates are best possible.

As is often the case in this field, there is a lot of technical book-keeping and juggling of parameters in the formal arguments of Bourgain and Guth, but the main ideas and “numerology” can be expressed fairly readily. (In mathematics, numerology refers to the empirically observed relationships between various key exponents and other numerical parameters; in many cases, one can use shortcuts such as dimensional analysis or informal heuristic, to compute these exponents long before the formal argument is completely in place.) Below the fold, I would like to record this numerology for the simplest of the Bourgain-Guth arguments, namely a reproof of (1) for {p > 3 \frac{1}{3}}. This is primarily for my own benefit, but may be of interest to other experts in this particular topic. (See also my 2003 lecture notes on the restriction conjecture.)

In order to focus on the ideas in the paper (rather than on the technical details), I will adopt an informal, heuristic approach, for instance by interpreting the uncertainty principle and the pigeonhole principle rather liberally, and by focusing on main terms in a decomposition and ignoring secondary terms. I will also be somewhat vague with regard to asymptotic notation such as {\ll}. Making the arguments rigorous requires a certain amount of standard but tedious effort (and is one of the main reasons why the Bourgain-Guth paper is as long as it is), which I will not focus on here.

Read the rest of this entry »

In this final lecture in the Marker lecture series, I discuss the recent work of Bourgain, Gamburd, and Sarnak on how arithmetic combinatorics and expander graphs were used to sieve for almost primes in various algebraic sets.

Read the rest of this entry »