You are currently browsing the tag archive for the ‘almost orthogonality’ tag.

One of the basic problems in analytic number theory is to estimate sums of the form

\displaystyle  \sum_{p<x} f(p)

as {x \rightarrow \infty}, where {p} ranges over primes and {f} is some explicit function of interest (e.g. a linear phase function {f(p) = e^{2\pi i \alpha p}} for some real number {\alpha}). This is essentially the same task as obtaining estimates on the sum

\displaystyle  \sum_{n<x} \Lambda(n) f(n)

where {\Lambda} is the von Mangoldt function. If {f} is bounded, {f(n)=O(1)}, then from the prime number theorem one has the trivial bound

\displaystyle  \sum_{n<x} \Lambda(n) f(n) = O(x)

but often (when {f} is somehow “oscillatory” in nature) one is seeking the refinement

\displaystyle  \sum_{n<x} \Lambda(n) f(n) = o(x) \ \ \ \ \ (1)

or equivalently

\displaystyle  \sum_{p<x} f(p) = o(\frac{x}{\log x}). \ \ \ \ \ (2)

Thanks to identities such as

\displaystyle  \Lambda(n) = \sum_{d|n} \mu(d) \log(\frac{n}{d}), \ \ \ \ \ (3)

where {\mu} is the Möbius function, refinements such as (1) are similar in spirit to estimates of the form

\displaystyle  \sum_{n<x} \mu(n) f(n) = o(x). \ \ \ \ \ (4)

Unfortunately, the connection between (1) and (4) is not particularly tight; roughly speaking, one needs to improve the bounds in (4) (and variants thereof) by about two factors of {\log x} before one can use identities such as (3) to recover (1). Still, one generally thinks of (1) and (4) as being “morally” equivalent, even if they are not formally equivalent.

When {f} is oscillating in a sufficiently “irrational” way, then one standard way to proceed is the method of Type I and Type II sums, which uses truncated versions of divisor identities such as (3) to expand out either (1) or (4) into linear (Type I) or bilinear sums (Type II) with which one can exploit the oscillation of {f}. For instance, Vaughan’s identity lets one rewrite the sum in (1) as the sum of the Type I sum

\displaystyle  \sum_{d \leq U} \mu(d) (\sum_{V/d \leq r \leq x/d} (\log r) f(rd)),

the Type I sum

\displaystyle  -\sum_{d \leq UV} a(d) \sum_{V/d \leq r \leq x/d} f(rd),

the Type II sum

\displaystyle  -\sum_{V \leq d \leq x/U} \sum_{U < m \leq x/V} \Lambda(d) b(m) f(dm),

and the error term {\sum_{d \leq V} \Lambda(n) f(n)}, whenever {1 \leq U, V \leq x} are parameters, and {a, b} are the sequences

\displaystyle  a(d) := \sum_{e \leq U, f \leq V: ef = d} \Lambda(d) \mu(e)


\displaystyle  b(m) := \sum_{d|m: d \leq U} \mu(d).

Similarly one can express (4) as the Type I sum

\displaystyle  -\sum_{d \leq UV} c(d) \sum_{UV/d \leq r \leq x/d} f(rd),

the Type II sum

\displaystyle  - \sum_{V < d \leq x/U} \sum_{U < m \leq x/d} \mu(m) b(d) f(dm)

and the error term {\sum_{d \leq UV} \mu(n) f(N)}, whenever {1 \leq U,V \leq x} with {UV \leq x}, and {c} is the sequence

\displaystyle  c(d) := \sum_{e \leq U, f \leq V: ef = d} \mu(d) \mu(e).

After eliminating troublesome sequences such as {a(), b(), c()} via Cauchy-Schwarz or the triangle inequality, one is then faced with the task of estimating Type I sums such as

\displaystyle  \sum_{r \leq y} f(rd)

or Type II sums such as

\displaystyle  \sum_{r \leq y} f(rd) \overline{f(rd')}

for various {y, d, d' \geq 1}. Here, the trivial bound is {O(y)}, but due to a number of logarithmic inefficiencies in the above method, one has to obtain bounds that are more like {O( \frac{y}{\log^C y})} for some constant {C} (e.g. {C=5}) in order to end up with an asymptotic such as (1) or (4).

However, in a recent paper of Bourgain, Sarnak, and Ziegler, it was observed that as long as one is only seeking the Mobius orthogonality (4) rather than the von Mangoldt orthogonality (1), one can avoid losing any logarithmic factors, and rely purely on qualitative equidistribution properties of {f}. A special case of their orthogonality criterion (which actually dates back to an earlier paper of Katai, as was pointed out to me by Nikos Frantzikinakis) is as follows:

Proposition 1 (Orthogonality criterion) Let {f: {\bf N} \rightarrow {\bf C}} be a bounded function such that

\displaystyle  \sum_{n \leq x} f(pn) \overline{f(qn)} = o(x) \ \ \ \ \ (5)

for any distinct primes {p, q} (where the decay rate of the error term {o(x)} may depend on {p} and {q}). Then

\displaystyle  \sum_{n \leq x} \mu(n) f(n) =o(x). \ \ \ \ \ (6)

Actually, the Bourgain-Sarnak-Ziegler paper establishes a more quantitative version of this proposition, in which {\mu} can be replaced by an arbitrary bounded multiplicative function, but we will content ourselves with the above weaker special case. (See also these notes of Harper, which uses the Katai argument to give a slightly weaker quantitative bound in the same spirit.) This criterion can be viewed as a multiplicative variant of the classical van der Corput lemma, which in our notation asserts that {\sum_{n \leq x} f(n) = o(x)} if one has {\sum_{n \leq x} f(n+h) \overline{f(n)} = o(x)} for each fixed non-zero {h}.

As a sample application, Proposition 1 easily gives a proof of the asymptotic

\displaystyle  \sum_{n \leq x} \mu(n) e^{2\pi i \alpha n} = o(x)

for any irrational {\alpha}. (For rational {\alpha}, this is a little trickier, as it is basically equivalent to the prime number theorem in arithmetic progressions.) The paper of Bourgain, Sarnak, and Ziegler also apply this criterion to nilsequences (obtaining a quick proof of a qualitative version of a result of Ben Green and myself, see these notes of Ziegler for details) and to horocycle flows (for which no Möbius orthogonality result was previously known).

Informally, the connection between (5) and (6) comes from the multiplicative nature of the Möbius function. If (6) failed, then {\mu(n)} exhibits strong correlation with {f(n)}; by change of variables, we then expect {\mu(pn)} to correlate with {f(pn)} and {\mu(pm)} to correlate with {f(qn)}, for “typical” {p,q} at least. On the other hand, since {\mu} is multiplicative, {\mu(pn)} exhibits strong correlation with {\mu(qn)}. Putting all this together (and pretending correlation is transitive), this would give the claim (in the contrapositive). Of course, correlation is not quite transitive, but it turns out that one can use the Cauchy-Schwarz inequality as a substitute for transitivity of correlation in this case.

I will give a proof of Proposition 1 below the fold (which is not quite based on the argument in the above mentioned paper, but on a variant of that argument communicated to me by Tamar Ziegler, and also independently discovered by Adam Harper). The main idea is to exploit the following observation: if {P} is a “large” but finite set of primes (in the sense that the sum {A := \sum_{p \in P} \frac{1}{p}} is large), then for a typical large number {n} (much larger than the elements of {P}), the number of primes in {P} that divide {n} is pretty close to {A = \sum_{p \in P} \frac{1}{p}}:

\displaystyle  \sum_{p \in P: p|n} 1 \approx A. \ \ \ \ \ (7)

A more precise formalisation of this heuristic is provided by the Turan-Kubilius inequality, which is proven by a simple application of the second moment method.

In particular, one can sum (7) against {\mu(n) f(n)} and obtain an approximation

\displaystyle  \sum_{n \leq x} \mu(n) f(n) \approx \frac{1}{A} \sum_{p \in P} \sum_{n \leq x: p|n} \mu(n) f(n)

that approximates a sum of {\mu(n) f(n)} by a bunch of sparser sums of {\mu(n) f(n)}. Since

\displaystyle  x = \frac{1}{A} \sum_{p \in P} \frac{x}{p},

we see (heuristically, at least) that in order to establish (4), it would suffice to establish the sparser estimates

\displaystyle  \sum_{n \leq x: p|n} \mu(n) f(n) = o(\frac{x}{p})

for all {p \in P} (or at least for “most” {p \in P}).

Now we make the change of variables {n = pm}. As the Möbius function is multiplicative, we usually have {\mu(n) = \mu(p) \mu(m) = - \mu(m)}. (There is an exception when {n} is divisible by {p^2}, but this will be a rare event and we will be able to ignore it.) So it should suffice to show that

\displaystyle  \sum_{m \leq x/p} \mu(m) f(pm) = o( x/p )

for most {p \in P}. However, by the hypothesis (5), the sequences {m \mapsto f(pm)} are asymptotically orthogonal as {p} varies, and this claim will then follow from a Cauchy-Schwarz argument.

Read the rest of this entry »

A basic problem in harmonic analysis (as well as in linear algebra, random matrix theory, and high-dimensional geometry) is to estimate the operator norm {\|T\|_{op}} of a linear map {T: H \rightarrow H'} between two Hilbert spaces, which we will take to be complex for sake of discussion. Even the finite-dimensional case {T: {\bf C}^m \rightarrow {\bf C}^n} is of interest, as this operator norm is the same as the largest singular value {\sigma_1(A)} of the {n \times m} matrix {A} associated to {T}.

In general, this operator norm is hard to compute precisely, except in special cases. One such special case is that of a diagonal operator, such as that associated to an {n \times n} diagonal matrix {D = \hbox{diag}(\lambda_1,\ldots,\lambda_n)}. In this case, the operator norm is simply the supremum norm of the diagonal coefficients:

\displaystyle  \|D\|_{op} = \sup_{1 \leq i \leq n} |\lambda_i|. \ \ \ \ \ (1)

A variant of (1) is Schur’s test, which for simplicity we will phrase in the setting of finite-dimensional operators {T: {\bf C}^m \rightarrow {\bf C}^n} given by a matrix {A = (a_{ij})_{1 \leq i \leq n; 1 \leq j \leq m}} via the usual formula

\displaystyle  T (x_j)_{j=1}^m := ( \sum_{j=1}^m a_{ij} x_j )_{i=1}^n.

A simple version of this test is as follows: if all the absolute row sums and columns sums of {A} are bounded by some constant {M}, thus

\displaystyle  \sum_{j=1}^m |a_{ij}| \leq M \ \ \ \ \ (2)

for all {1 \leq i \leq n} and

\displaystyle  \sum_{i=1}^n |a_{ij}| \leq M \ \ \ \ \ (3)

for all {1 \leq j \leq m}, then

\displaystyle  \|T\|_{op} = \|A\|_{op} \leq M \ \ \ \ \ (4)

(note that this generalises (the upper bound in) (1).) Indeed, to see (4), it suffices by duality and homogeneity to show that

\displaystyle  |\sum_{i=1}^n (\sum_{j=1}^m a_{ij} x_j) y_i| \leq M

whenever {(x_j)_{j=1}^m} and {(y_i)_{i=1}^n} are sequences with {\sum_{j=1}^m |x_j|^2 = \sum_{i=1}^n |y_i|^2 = 1}; but this easily follows from the arithmetic mean-geometric mean inequality

\displaystyle  |a_{ij} x_j) y_i| \leq \frac{1}{2} |a_{ij}| |x_i|^2 + \frac{1}{2} |a_{ij}| |y_j|^2

and (2), (3).

Schur’s test (4) (and its many generalisations to weighted situations, or to Lebesgue or Lorentz spaces) is particularly useful for controlling operators in which the role of oscillation (as reflected in the phase of the coefficients {a_{ij}}, as opposed to just their magnitudes {|a_{ij}|}) is not decisive. However, it is of limited use in situations that involve a lot of cancellation. For this, a different test, known as the Cotlar-Stein lemma, is much more flexible and powerful. It can be viewed in a sense as a non-commutative variant of Schur’s test (4) (or of (1)), in which the scalar coefficients {\lambda_i} or {a_{ij}} are replaced by operators instead.

To illustrate the basic flavour of the result, let us return to the bound (1), and now consider instead a block-diagonal matrix

\displaystyle  A = \begin{pmatrix} \Lambda_1 & 0 & \ldots & 0 \\ 0 & \Lambda_2 & \ldots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \ldots & \Lambda_n \end{pmatrix} \ \ \ \ \ (5)

where each {\Lambda_i} is now a {m_i \times m_i} matrix, and so {A} is an {m \times m} matrix with {m := m_1 + \ldots +m_n}. Then we have

\displaystyle  \|A\|_{op} = \sup_{1 \leq i \leq n} \|\Lambda_i\|_{op}. \ \ \ \ \ (6)

Indeed, the lower bound is trivial (as can be seen by testing {A} on vectors which are supported on the {i^{th}} block of coordinates), while to establish the upper bound, one can make use of the orthogonal decomposition

\displaystyle  {\bf C}^m \equiv \bigoplus_{i=1}^m {\bf C}^{m_i} \ \ \ \ \ (7)

to decompose an arbitrary vector {x \in {\bf C}^m} as

\displaystyle  x = \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{pmatrix}

with {x_i \in {\bf C}^{m_i}}, in which case we have

\displaystyle  Ax = \begin{pmatrix} \Lambda_1 x_1 \\ \Lambda_2 x_2 \\ \vdots \\ \Lambda_n x_n \end{pmatrix}

and the upper bound in (6) then follows from a simple computation.

The operator {T} associated to the matrix {A} in (5) can be viewed as a sum {T = \sum_{i=1}^n T_i}, where each {T_i} corresponds to the {\Lambda_i} block of {A}, in which case (6) can also be written as

\displaystyle  \|T\|_{op} = \sup_{1 \leq i \leq n} \|T_i\|_{op}. \ \ \ \ \ (8)

When {n} is large, this is a significant improvement over the triangle inequality, which merely gives

\displaystyle  \|T\|_{op} \leq \sum_{1 \leq i \leq n} \|T_i\|_{op}.

The reason for this gain can ultimately be traced back to the “orthogonality” of the {T_i}; that they “occupy different columns” and “different rows” of the range and domain of {T}. This is obvious when viewed in the matrix formalism, but can also be described in the more abstract Hilbert space operator formalism via the identities

\displaystyle  T_i^* T_j = 0 \ \ \ \ \ (9)


\displaystyle  T_i T^* j = 0 \ \ \ \ \ (10)

whenever {i \neq j}. (The first identity asserts that the ranges of the {T_i} are orthogonal to each other, and the second asserts that the coranges of the {T_i} (the ranges of the adjoints {T_i^*}) are orthogonal to each other.) By replacing (7) with a more abstract orthogonal decomposition into these ranges and coranges, one can in fact deduce (8) directly from (9) and (10).

The Cotlar-Stein lemma is an extension of this observation to the case where the {T_i} are merely almost orthogonal rather than orthogonal, in a manner somewhat analogous to how Schur’s test (partially) extends (1) to the non-diagonal case. Specifically, we have

Lemma 1 (Cotlar-Stein lemma) Let {T_1,\ldots,T_n: H \rightarrow H'} be a finite sequence of bounded linear operators from one Hilbert space {H} to another {H'}, obeying the bounds

\displaystyle  \sum_{j=1}^n \| T_i T_j^* \|_{op}^{1/2} \leq M \ \ \ \ \ (11)


\displaystyle  \sum_{j=1}^n \| T_i^* T_j \|_{op}^{1/2} \leq M \ \ \ \ \ (12)

for all {i=1,\ldots,n} and some {M > 0} (compare with (2), (3)). Then one has

\displaystyle  \| \sum_{i=1}^n T_i \|_{op} \leq M. \ \ \ \ \ (13)

Note from the basic {TT^*} identity

\displaystyle  \|T\|_{op} = \|TT^* \|_{op}^{1/2} = \|T^* T\|_{op}^{1/2} \ \ \ \ \ (14)

that the hypothesis (11) (or (12)) already gives the bound

\displaystyle  \|T_i\|_{op} \leq M \ \ \ \ \ (15)

on each component {T_i} of {T}, which by the triangle inequality gives the inferior bound

\displaystyle  \| \sum_{i=1}^n T_i \|_{op} \leq nM;

the point of the Cotlar-Stein lemma is that the dependence on {n} in this bound is eliminated in (13), which in particular makes the bound suitable for extension to the limit {n \rightarrow \infty} (see Remark 1 below).

The Cotlar-Stein lemma was first established by Cotlar in the special case of commuting self-adjoint operators, and then independently by Cotlar and Stein in full generality, with the proof appearing in a subsequent paper of Knapp and Stein.

The Cotlar-Stein lemma is often useful in controlling operators such as singular integral operators or pseudo-differential operators {T} which “do not mix scales together too much”, in that operators {T} map functions “that oscillate at a given scale {2^{-i}}” to functions that still mostly oscillate at the same scale {2^{-i}}. In that case, one can often split {T} into components {T_i} which essentically capture the scale {2^{-i}} behaviour, and understanding {L^2} boundedness properties of {T} then reduces to establishing the boundedness of the simpler operators {T_i} (and of establishing a sufficient decay in products such as {T_i^* T_j} or {T_i T_j^*} when {i} and {j} are separated from each other). In some cases, one can use Fourier-analytic tools such as Littlewood-Paley projections to generate the {T_i}, but the true power of the Cotlar-Stein lemma comes from situations in which the Fourier transform is not suitable, such as when one has a complicated domain (e.g. a manifold or a non-abelian Lie group), or very rough coefficients (which would then have badly behaved Fourier behaviour). One can then select the decomposition {T = \sum_i T_i} in a fashion that is tailored to the particular operator {T}, and is not necessarily dictated by Fourier-analytic considerations.

Once one is in the almost orthogonal setting, as opposed to the genuinely orthogonal setting, the previous arguments based on orthogonal projection seem to fail completely. Instead, the proof of the Cotlar-Stein lemma proceeds via an elegant application of the tensor power trick (or perhaps more accurately, the power method), in which the operator norm of {T} is understood through the operator norm of a large power of {T} (or more precisely, of its self-adjoint square {TT^*} or {T^* T}). Indeed, from an iteration of (14) we see that for any natural number {N}, one has

\displaystyle  \|T\|_{op}^{2N} = \| (TT^*)^N \|_{op}. \ \ \ \ \ (16)

To estimate the right-hand side, we expand out the right-hand side and apply the triangle inequality to bound it by

\displaystyle  \sum_{i_1,j_1,\ldots,i_N,j_N \in \{1,\ldots,n\}} \| T_{i_1} T_{j_1}^* T_{i_2} T_{j_2}^* \ldots T_{i_N} T_{j_N}^* \|_{op}. \ \ \ \ \ (17)

Recall that when we applied the triangle inequality directly to {T}, we lost a factor of {n} in the final estimate; it will turn out that we will lose a similar factor here, but this factor will eventually be attenuated into nothingness by the tensor power trick.

To bound (17), we use the basic inequality {\|ST\|_{op} \leq \|S\|_{op} \|T\|_{op}} in two different ways. If we group the product {T_{i_1} T_{j_1}^* T_{i_2} T_{j_2}^* \ldots T_{i_N} T_{j_N}^*} in pairs, we can bound the summand of (17) by

\displaystyle  \| T_{i_1} T_{j_1}^* \|_{op} \ldots \| T_{i_N} T_{j_N}^* \|_{op}.

On the other hand, we can group the product by pairs in another way, to obtain the bound of

\displaystyle  \| T_{i_1} \|_{op} \| T_{j_1}^* T_{i_2} \|_{op} \ldots \| T_{j_{N-1}}^* T_{i_N}\|_{op} \| T_{j_N}^* \|_{op}.

We bound {\| T_{i_1} \|_{op}} and {\| T_{j_N}^* \|_{op}} crudely by {M} using (15). Taking the geometric mean of the above bounds, we can thus bound (17) by

\displaystyle  M \sum_{i_1,j_1,\ldots,i_N,j_N \in \{1,\ldots,n\}} \| T_{i_1} T_{j_1}^* \|_{op}^{1/2} \| T_{j_1}^* T_{i_2} \|_{op}^{1/2} \ldots \| T_{j_{N-1}}^* T_{i_N}\|_{op}^{1/2} \| T_{i_N} T_{j_N}^* \|_{op}^{1/2}.

If we then sum this series first in {j_N}, then in {i_N}, then moving back all the way to {i_1}, using (11) and (12) alternately, we obtain a final bound of

\displaystyle  n M^{2N}

for (16). Taking {N^{th}} roots, we obtain

\displaystyle  \|T\|_{op} \leq n^{1/2N} M.

Sending {N \rightarrow \infty}, we obtain the claim.

Remark 1 As observed in a number of places (see e.g. page 318 of Stein’s book, or this paper of Comech, the Cotlar-Stein lemma can be extended to infinite sums {\sum_{i=1}^\infty T_i} (with the obvious changes to the hypotheses (11), (12)). Indeed, one can show that for any {f \in H}, the sum {\sum_{i=1}^\infty T_i f} is unconditionally convergent in {H'} (and furthermore has bounded {2}-variation), and the resulting operator {\sum_{i=1}^\infty T_i} is a bounded linear operator with an operator norm bound on {M}.

Remark 2 If we specialise to the case where all the {T_i} are equal, we see that the bound in the Cotlar-Stein lemma is sharp, at least in this case. Thus we see how the tensor power trick can convert an inefficient argument, such as that obtained using the triangle inequality or crude bounds such as (15), into an efficient one.

Remark 3 One can prove Schur’s test by a similar method. Indeed, starting from the inequality

\displaystyle  \|A\|_{op}^{2N} \leq \hbox{tr}( (AA^*)^N )

(which follows easily from the singular value decomposition), we can bound {\|A\|_{op}^{2N}} by

\displaystyle  \sum_{i_1,\ldots,j_N \in \{1,\ldots,n\}} a_{i_1,j_1} \overline{a_{j_1,i_2}} \ldots a_{i_N,j_N} \overline{a_{j_N,i_1}}.

Estimating the other two terms in the summand by {M}, and then repeatedly summing the indices one at a time as before, we obtain

\displaystyle  \|A\|_{op}^{2N} \leq n M^{2N}

and the claim follows from the tensor power trick as before. On the other hand, in the converse direction, I do not know of any way to prove the Cotlar-Stein lemma that does not basically go through the tensor power argument.

The first Distinguished Lecture Series at UCLA for this academic year is given by Elias Stein (who, incidentally, was my graduate student advisor), who is lecturing on “Singular Integrals and Several Complex Variables: Some New Perspectives“.  The first lecture was a historical (and non-technical) survey of modern harmonic analysis (which, amazingly, was compressed into half an hour), followed by an introduction as to how this theory is currently in the process of being adapted to handle the basic analytical issues in several complex variables, a topic which in many ways is still only now being developed.  The second and third lectures will focus on these issues in greater depth.

As usual, any errors here are due to my transcription and interpretation of the lecture.

[Update, Oct 27: The slides from the talk are now available here.]

Read the rest of this entry »

As many readers may already know, my good friend and fellow mathematical blogger Tim Gowers, having wrapped up work on the Princeton Companion to Mathematics (which I believe is now in press), has begun another mathematical initiative, namely a “Tricks Wiki” to act as a repository for mathematical tricks and techniques.    Tim has already started the ball rolling with several seed articles on his own blog, and asked me to also contribute some articles.  (As I understand it, these articles will be migrated to the Wiki in a few months, once it is fully set up, and then they will evolve with edits and contributions by anyone who wishes to pitch in, in the spirit of Wikipedia; in particular, articles are not intended to be permanently authored or signed by any single contributor.)

So today I’d like to start by extracting some material from an old post of mine on “Amplification, arbitrage, and the tensor power trick” (as well as from some of the comments), and converting it to the Tricks Wiki format, while also taking the opportunity to add a few more examples.

Title: The tensor power trick

Quick description: If one wants to prove an inequality X \leq Y for some non-negative quantities X, Y, but can only see how to prove a quasi-inequality X \leq CY that loses a multiplicative constant C, then try to replace all objects involved in the problem by “tensor powers” of themselves and apply the quasi-inequality to those powers.  If all goes well, one can show that X^M \leq C Y^M for all M \geq 1, with a constant C which is independent of M, which implies that X \leq Y as desired by taking M^{th} roots and then taking limits as M \to \infty.

Read the rest of this entry »


RSS Google+ feed

  • An error has occurred; the feed is probably down. Try again later.

Get every new post delivered to your Inbox.

Join 3,314 other followers