You are currently browsing the tag archive for the ‘Peter Sarnak’ tag.

One of the basic general problems in analytic number theory is to understand as much as possible the fluctuations of the Möbius function {\mu(n)}, defined as {(-1)^k} when {n} is the product of {k} distinct primes, and zero otherwise. For instance, as {\mu} takes values in {\{-1,0,1\}}, we have the trivial bound

\displaystyle  |\sum_{n \leq x} \mu(n)| \leq x

and the seemingly slight improvement

\displaystyle  \sum_{n \leq x} \mu(n) = o(x) \ \ \ \ \ (1)

is already equivalent to the prime number theorem, as observed by Landau (see e.g. this previous blog post for a proof), while the much stronger (and still open) improvement

\displaystyle  \sum_{n \leq x} \mu(n) = O(x^{1/2+o(1)})

is equivalent to the notorious Riemann hypothesis.

There is a general Möbius pseudorandomness heuristic that suggests that the sign pattern {\mu} behaves so randomly (or pseudorandomly) that one should expect a substantial amount of cancellation in sums that involve the sign fluctuation of the Möbius function in a nontrivial fashion, with the amount of cancellation present comparable to the amount that an analogous random sum would provide; cf. the probabilistic heuristic discussed in this recent blog post. There are a number of ways to make this heuristic precise. As already mentioned, the Riemann hypothesis can be considered one such manifestation of the heuristic. Another manifestation is the following old conjecture of Chowla:

Conjecture 1 (Chowla’s conjecture) For any fixed integer {m} and exponents {a_1,a_2,\ldots,a_m \geq 0}, with at least one of the {a_i} odd (so as not to completely destroy the sign cancellation), we have

\displaystyle  \sum_{n \leq x} \mu(n+1)^{a_1} \ldots \mu(n+m)^{a_m} = o_{x \rightarrow \infty;m}(x).

Note that as {\mu^a = \mu^{a+2}} for any {a \geq 1}, we can reduce to the case when the {a_i} take values in {0,1,2} here. When only one of the {a_i} are odd, this is essentially the prime number theorem in arithmetic progressions (after some elementary sieving), but with two or more of the {a_i} are odd, the problem becomes completely open. For instance, the estimate

\displaystyle  \sum_{n \leq x} \mu(n) \mu(n+2) = o(x)

is morally very close to the conjectured asymptotic

\displaystyle  \sum_{n \leq x} \Lambda(n) \Lambda(n+2) = 2\Pi_2 x + o(x)

for the von Mangoldt function {\Lambda}, where {\Pi_2 := \prod_{p > 2} (1 - \frac{1}{(p-1)^2}) = 0.66016\ldots} is the twin prime constant; this asymptotic in turn implies the twin prime conjecture. (To formally deduce estimates for von Mangoldt from estimates for Möbius, though, typically requires some better control on the error terms than {o()}, in particular gains of some power of {\log x} are usually needed. See this previous blog post for more discussion.)

Remark 1 The Chowla conjecture resembles an assertion that, for {n} chosen randomly and uniformly from {1} to {x}, the random variables {\mu(n+1),\ldots,\mu(n+k)} become asymptotically independent of each other (in the probabilistic sense) as {x \rightarrow \infty}. However, this is not quite accurate, because some moments (namely those with all exponents {a_i} even) have the “wrong” asymptotic value, leading to some unwanted correlation between the two variables. For instance, the events {\mu(n)=0} and {\mu(n+4)=0} have a strong correlation with each other, basically because they are both strongly correlated with the event of {n} being divisible by {4}. A more accurate interpretation of the Chowla conjecture is that the random variables {\mu(n+1),\ldots,\mu(n+k)} are asymptotically conditionally independent of each other, after conditioning on the zero pattern {\mu(n+1)^2,\ldots,\mu(n+k)^2}; thus, it is the sign of the Möbius function that fluctuates like random noise, rather than the zero pattern. (The situation is a bit cleaner if one works instead with the Liouville function {\lambda} instead of the Möbius function {\mu}, as this function never vanishes, but we will stick to the traditional Möbius function formalism here.)

A more recent formulation of the Möbius randomness heuristic is the following conjecture of Sarnak. Given a bounded sequence {f: {\bf N} \rightarrow {\bf C}}, define the topological entropy of the sequence to be the least exponent {\sigma} with the property that for any fixed {\epsilon > 0}, and for {m} going to infinity the set {\{ (f(n+1),\ldots,f(n+m)): n \in {\bf N} \} \subset {\bf C}^m} of {f} can be covered by {O( \exp( \sigma m + o(m) ) )} balls of radius {\epsilon}. (If {f} arises from a minimal topological dynamical system {(X,T)} by {f(n) := f(T^n x)}, the above notion is equivalent to the usual notion of the topological entropy of a dynamical system.) For instance, if the sequence is a bit sequence (i.e. it takes values in {\{0,1\}}), then there are only {\exp(\sigma m + o(m))} {m}-bit patterns that can appear as blocks of {m} consecutive bits in this sequence. As a special case, a Turing machine with bounded memory that had access to a random number generator at the rate of one random bit produced every {T} units of time, but otherwise evolved deterministically, would have an output sequence that had a topological entropy of at most {\frac{1}{T} \log 2}. A bounded sequence is said to be deterministic if its topological entropy is zero. A typical example is a polynomial sequence such as {f(n) := e^{2\pi i \alpha n^2}} for some fixed {\sigma}; the {m}-blocks of such polynomials sequence have covering numbers that only grow polynomially in {m}, rather than exponentially, thus yielding the zero entropy. Unipotent flows, such as the horocycle flow on a compact hyperbolic surface, are another good source of deterministic sequences.

Conjecture 2 (Sarnak’s conjecture) Let {f: {\bf N} \rightarrow {\bf C}} be a deterministic bounded sequence. Then

\displaystyle  \sum_{n \leq x} \mu(n) f(n) = o_{x \rightarrow \infty;f}(x).

This conjecture in general is still quite far from being solved. However, special cases are known:

  • For constant sequences, this is essentially the prime number theorem (1).
  • For periodic sequences, this is essentially the prime number theorem in arithmetic progressions.
  • For quasiperiodic sequences such as {f(n) = F(\alpha n \hbox{ mod } 1)} for some continuous {F}, this follows from the work of Davenport.
  • For nilsequences, this is a result of Ben Green and myself.
  • For horocycle flows, this is a result of Bourgain, Sarnak, and Ziegler.
  • For the Thue-Morse sequence, this is a result of Dartyge-Tenenbaum (with a stronger error term obtained by Maduit-Rivat). A subsequent result of Bourgain handles all bounded rank one sequences (though the Thue-Morse sequence is actually of rank two), and a related result of Green establishes asymptotic orthogonality of the Möbius function to bounded depth circuits, although such functions are not necessarily deterministic in nature.
  • For the Rudin-Shapiro sequence, I sketched out an argument at this MathOverflow post.
  • The Möbius function is known to itself be non-deterministic, because its square {\mu^2(n)} (i.e. the indicator of the square-free functions) is known to be non-deterministic (indeed, its topological entropy is {\frac{6}{\pi^2}\log 2}). (The corresponding question for the Liouville function {\lambda(n)}, however, remains open, as the square {\lambda^2(n)=1} has zero entropy.)
  • In the converse direction, it is easy to construct sequences of arbitrarily small positive entropy that correlate with the Möbius function (a rather silly example is {\mu(n) 1_{k|n}} for some fixed large (squarefree) {k}, which has topological entropy at most {\log 2/k} but clearly correlates with {\mu}).

See this survey of Sarnak for further discussion of this and related topics.

In this post I wanted to give a very nice argument of Sarnak that links the above two conjectures:

Proposition 3 The Chowla conjecture implies the Sarnak conjecture.

The argument does not use any number-theoretic properties of the Möbius function; one could replace {\mu} in both conjectures by any other function from the natural numbers to {\{-1,0,+1\}} and obtain the same implication. The argument consists of the following ingredients:

  1. To show that {\sum_{n<x} \mu(n) f(n) = o(x)}, it suffices to show that the expectation of the random variable {\frac{1}{m} (\mu(n+1)f(n+1)+\ldots+\mu(n+m)f(n+m))}, where {n} is drawn uniformly at random from {1} to {x}, can be made arbitrary small by making {m} large (and {n} even larger).
  2. By the union bound and the zero topological entropy of {f}, it suffices to show that for any bounded deterministic coefficients {c_1,\ldots,c_m}, the random variable {\frac{1}{m}(c_1 \mu(n+1) + \ldots + c_m \mu(n+m))} concentrates with exponentially high probability.
  3. Finally, this exponentially high concentration can be achieved by the moment method, using a slight variant of the moment method proof of the large deviation estimates such as the Chernoff inequality or Hoeffding inequality (as discussed in this blog post).

As is often the case, though, while the “top-down” order of steps presented above is perhaps the clearest way to think conceptually about the argument, in order to present the argument formally it is more convenient to present the arguments in the reverse (or “bottom-up”) order. This is the approach taken below the fold.

Read the rest of this entry »

One of the basic problems in analytic number theory is to estimate sums of the form

\displaystyle  \sum_{p<x} f(p)

as {x \rightarrow \infty}, where {p} ranges over primes and {f} is some explicit function of interest (e.g. a linear phase function {f(p) = e^{2\pi i \alpha p}} for some real number {\alpha}). This is essentially the same task as obtaining estimates on the sum

\displaystyle  \sum_{n<x} \Lambda(n) f(n)

where {\Lambda} is the von Mangoldt function. If {f} is bounded, {f(n)=O(1)}, then from the prime number theorem one has the trivial bound

\displaystyle  \sum_{n<x} \Lambda(n) f(n) = O(x)

but often (when {f} is somehow “oscillatory” in nature) one is seeking the refinement

\displaystyle  \sum_{n<x} \Lambda(n) f(n) = o(x) \ \ \ \ \ (1)

or equivalently

\displaystyle  \sum_{p<x} f(p) = o(\frac{x}{\log x}). \ \ \ \ \ (2)

Thanks to identities such as

\displaystyle  \Lambda(n) = \sum_{d|n} \mu(d) \log(\frac{n}{d}), \ \ \ \ \ (3)

where {\mu} is the Möbius function, refinements such as (1) are similar in spirit to estimates of the form

\displaystyle  \sum_{n<x} \mu(n) f(n) = o(x). \ \ \ \ \ (4)

Unfortunately, the connection between (1) and (4) is not particularly tight; roughly speaking, one needs to improve the bounds in (4) (and variants thereof) by about two factors of {\log x} before one can use identities such as (3) to recover (1). Still, one generally thinks of (1) and (4) as being “morally” equivalent, even if they are not formally equivalent.

When {f} is oscillating in a sufficiently “irrational” way, then one standard way to proceed is the method of Type I and Type II sums, which uses truncated versions of divisor identities such as (3) to expand out either (1) or (4) into linear (Type I) or bilinear sums (Type II) with which one can exploit the oscillation of {f}. For instance, Vaughan’s identity lets one rewrite the sum in (1) as the sum of the Type I sum

\displaystyle  \sum_{d \leq U} \mu(d) (\sum_{V/d \leq r \leq x/d} (\log r) f(rd)),

the Type I sum

\displaystyle  -\sum_{d \leq UV} a(d) \sum_{V/d \leq r \leq x/d} f(rd),

the Type II sum

\displaystyle  -\sum_{V \leq d \leq x/U} \sum_{U < m \leq x/V} \Lambda(d) b(m) f(dm),

and the error term {\sum_{d \leq V} \Lambda(n) f(n)}, whenever {1 \leq U, V \leq x} are parameters, and {a, b} are the sequences

\displaystyle  a(d) := \sum_{e \leq U, f \leq V: ef = d} \Lambda(d) \mu(e)

and

\displaystyle  b(m) := \sum_{d|m: d \leq U} \mu(d).

Similarly one can express (4) as the Type I sum

\displaystyle  -\sum_{d \leq UV} c(d) \sum_{UV/d \leq r \leq x/d} f(rd),

the Type II sum

\displaystyle  - \sum_{V < d \leq x/U} \sum_{U < m \leq x/d} \mu(m) b(d) f(dm)

and the error term {\sum_{d \leq UV} \mu(n) f(N)}, whenever {1 \leq U,V \leq x} with {UV \leq x}, and {c} is the sequence

\displaystyle  c(d) := \sum_{e \leq U, f \leq V: ef = d} \mu(d) \mu(e).

After eliminating troublesome sequences such as {a(), b(), c()} via Cauchy-Schwarz or the triangle inequality, one is then faced with the task of estimating Type I sums such as

\displaystyle  \sum_{r \leq y} f(rd)

or Type II sums such as

\displaystyle  \sum_{r \leq y} f(rd) \overline{f(rd')}

for various {y, d, d' \geq 1}. Here, the trivial bound is {O(y)}, but due to a number of logarithmic inefficiencies in the above method, one has to obtain bounds that are more like {O( \frac{y}{\log^C y})} for some constant {C} (e.g. {C=5}) in order to end up with an asymptotic such as (1) or (4).

However, in a recent paper of Bourgain, Sarnak, and Ziegler, it was observed that as long as one is only seeking the Mobius orthogonality (4) rather than the von Mangoldt orthogonality (1), one can avoid losing any logarithmic factors, and rely purely on qualitative equidistribution properties of {f}. A special case of their orthogonality criterion (which actually dates back to an earlier paper of Katai, as was pointed out to me by Nikos Frantzikinakis) is as follows:

Proposition 1 (Orthogonality criterion) Let {f: {\bf N} \rightarrow {\bf C}} be a bounded function such that

\displaystyle  \sum_{n \leq x} f(pn) \overline{f(qn)} = o(x) \ \ \ \ \ (5)

for any distinct primes {p, q} (where the decay rate of the error term {o(x)} may depend on {p} and {q}). Then

\displaystyle  \sum_{n \leq x} \mu(n) f(n) =o(x). \ \ \ \ \ (6)

Actually, the Bourgain-Sarnak-Ziegler paper establishes a more quantitative version of this proposition, in which {\mu} can be replaced by an arbitrary bounded multiplicative function, but we will content ourselves with the above weaker special case. (See also these notes of Harper, which uses the Katai argument to give a slightly weaker quantitative bound in the same spirit.) This criterion can be viewed as a multiplicative variant of the classical van der Corput lemma, which in our notation asserts that {\sum_{n \leq x} f(n) = o(x)} if one has {\sum_{n \leq x} f(n+h) \overline{f(n)} = o(x)} for each fixed non-zero {h}.

As a sample application, Proposition 1 easily gives a proof of the asymptotic

\displaystyle  \sum_{n \leq x} \mu(n) e^{2\pi i \alpha n} = o(x)

for any irrational {\alpha}. (For rational {\alpha}, this is a little trickier, as it is basically equivalent to the prime number theorem in arithmetic progressions.) The paper of Bourgain, Sarnak, and Ziegler also apply this criterion to nilsequences (obtaining a quick proof of a qualitative version of a result of Ben Green and myself, see these notes of Ziegler for details) and to horocycle flows (for which no Möbius orthogonality result was previously known).

Informally, the connection between (5) and (6) comes from the multiplicative nature of the Möbius function. If (6) failed, then {\mu(n)} exhibits strong correlation with {f(n)}; by change of variables, we then expect {\mu(pn)} to correlate with {f(pn)} and {\mu(pm)} to correlate with {f(qn)}, for “typical” {p,q} at least. On the other hand, since {\mu} is multiplicative, {\mu(pn)} exhibits strong correlation with {\mu(qn)}. Putting all this together (and pretending correlation is transitive), this would give the claim (in the contrapositive). Of course, correlation is not quite transitive, but it turns out that one can use the Cauchy-Schwarz inequality as a substitute for transitivity of correlation in this case.

I will give a proof of Proposition 1 below the fold (which is not quite based on the argument in the above mentioned paper, but on a variant of that argument communicated to me by Tamar Ziegler, and also independently discovered by Adam Harper). The main idea is to exploit the following observation: if {P} is a “large” but finite set of primes (in the sense that the sum {A := \sum_{p \in P} \frac{1}{p}} is large), then for a typical large number {n} (much larger than the elements of {P}), the number of primes in {P} that divide {n} is pretty close to {A = \sum_{p \in P} \frac{1}{p}}:

\displaystyle  \sum_{p \in P: p|n} 1 \approx A. \ \ \ \ \ (7)

A more precise formalisation of this heuristic is provided by the Turan-Kubilius inequality, which is proven by a simple application of the second moment method.

In particular, one can sum (7) against {\mu(n) f(n)} and obtain an approximation

\displaystyle  \sum_{n \leq x} \mu(n) f(n) \approx \frac{1}{A} \sum_{p \in P} \sum_{n \leq x: p|n} \mu(n) f(n)

that approximates a sum of {\mu(n) f(n)} by a bunch of sparser sums of {\mu(n) f(n)}. Since

\displaystyle  x = \frac{1}{A} \sum_{p \in P} \frac{x}{p},

we see (heuristically, at least) that in order to establish (4), it would suffice to establish the sparser estimates

\displaystyle  \sum_{n \leq x: p|n} \mu(n) f(n) = o(\frac{x}{p})

for all {p \in P} (or at least for “most” {p \in P}).

Now we make the change of variables {n = pm}. As the Möbius function is multiplicative, we usually have {\mu(n) = \mu(p) \mu(m) = - \mu(m)}. (There is an exception when {n} is divisible by {p^2}, but this will be a rare event and we will be able to ignore it.) So it should suffice to show that

\displaystyle  \sum_{m \leq x/p} \mu(m) f(pm) = o( x/p )

for most {p \in P}. However, by the hypothesis (5), the sequences {m \mapsto f(pm)} are asymptotically orthogonal as {p} varies, and this claim will then follow from a Cauchy-Schwarz argument.

Read the rest of this entry »

In this final lecture in the Marker lecture series, I discuss the recent work of Bourgain, Gamburd, and Sarnak on how arithmetic combinatorics and expander graphs were used to sieve for almost primes in various algebraic sets.

Read the rest of this entry »

Archives

RSS Google+ feed

  • An error has occurred; the feed is probably down. Try again later.
Follow

Get every new post delivered to your Inbox.

Join 4,042 other followers