You are currently browsing the tag archive for the ‘Cauchy-Schwarz’ tag.

This is the final continuation of the online reading seminar of Zhang’s paper for the polymath8 project. (There are two other continuations; this previous post, which deals with the combinatorial aspects of the second part of Zhang’s paper, and this previous post, that covers the Type I and Type II sums.) The main purpose of this post is to present (and hopefully, to improve upon) the treatment of the final and most innovative of the key estimates in Zhang’s paper, namely the Type III estimate.

The main estimate was already stated as Theorem 17 in the previous post, but we quickly recall the relevant definitions here. As in other posts, we always take ${x}$ to be a parameter going off to infinity, with the usual asymptotic notation ${O(), o(), \ll}$ associated to this parameter.

Definition 1 (Coefficient sequences) A coefficient sequence is a finitely supported sequence ${\alpha: {\bf N} \rightarrow {\bf R}}$ that obeys the bounds

$\displaystyle |\alpha(n)| \ll \tau^{O(1)}(n) \log^{O(1)}(x) \ \ \ \ \ (1)$

for all ${n}$, where ${\tau}$ is the divisor function.

• (i) If ${\alpha}$ is a coefficient sequence and ${a\ (q) = a \hbox{ mod } q}$ is a primitive residue class, the (signed) discrepancy ${\Delta(\alpha; a\ (q))}$ of ${\alpha}$ in the sequence is defined to be the quantity

$\displaystyle \Delta(\alpha; a \ (q)) := \sum_{n: n = a\ (q)} \alpha(n) - \frac{1}{\phi(q)} \sum_{n: (n,q)=1} \alpha(n). \ \ \ \ \ (2)$

• (ii) A coefficient sequence ${\alpha}$ is said to be at scale ${N}$ for some ${N \geq 1}$ if it is supported on an interval of the form ${[(1-O(\log^{-A_0} x)) N, (1+O(\log^{-A_0} x)) N]}$.
• (iii) A coefficient sequence ${\alpha}$ at scale ${N}$ is said to be smooth if it takes the form ${\alpha(n) = \psi(n/N)}$ for some smooth function ${\psi: {\bf R} \rightarrow {\bf C}}$ supported on ${[1-O(\log^{-A_0} x), 1+O(\log^{-A_0} x)]}$ obeying the derivative bounds

$\displaystyle \psi^{(j)}(t) = O( \log^{j A_0} x ) \ \ \ \ \ (3)$

for all fixed ${j \geq 0}$ (note that the implied constant in the ${O()}$ notation may depend on ${j}$).

For any ${I \subset {\bf R}}$, let ${{\mathcal S}_I}$ denote the square-free numbers whose prime factors lie in ${I}$. The main result of this post is then the following result of Zhang:

Theorem 2 (Type III estimate) Let ${\varpi, \delta > 0}$ be fixed quantities, and let ${M, N_1, N_2, N_3 \gg 1}$ be quantities such that

$\displaystyle x \ll M N_1 N_2 N_3 \ll x$

and

$\displaystyle N_1 \gg N_2, N_3$

and

$\displaystyle N_1^4 N_2^4 N_3^5 \gg x^{4+16\varpi+\delta+c}$

for some fixed ${c>0}$. Let ${\alpha, \psi_1, \psi_2, \psi_3}$ be coefficient sequences at scale ${M,N_1,N_2,N_3}$ respectively with ${\psi_1,\psi_2,\psi_3}$ smooth. Then for any ${I \subset [1,x^\delta]}$ we have

$\displaystyle \sum_{q \in {\mathcal S}_I: q< x^{1/2+2\varpi}} \sup_{a \in ({\bf Z}/q{\bf Z})^\times} |\Delta(\alpha \ast \beta; a)| \ll x \log^{-A} x.$

In fact we have the stronger “pointwise” estimate

$\displaystyle |\Delta(\alpha \ast \psi_1 \ast \psi_2 \ast \psi_3; a)| \ll x^{-\epsilon} \frac{x}{q} \ \ \ \ \ (4)$

for all ${q \in {\mathcal S}_I}$ with ${q < x^{1/2+2\varpi}}$ and all ${a \in ({\bf Z}/q{\bf Z})^\times}$, and some fixed ${\epsilon>0}$.

(This is very slightly stronger than previously claimed, in that the condition ${N_2 \gg N_3}$ has been dropped.)

It turns out that Zhang does not exploit any averaging of the ${\alpha}$ factor, and matters reduce to the following:

Theorem 3 (Type III estimate without ${\alpha}$) Let ${\delta > 0}$ be fixed, and let ${1 \ll N_1, N_2, N_3, d \ll x^{O(1)}}$ be quantities such that

$\displaystyle N_1 \gg N_2, N_3$

and

$\displaystyle d \in {\mathcal S}_{[1,x^\delta]}$

and

$\displaystyle N_1^4 N_2^4 N_3^5 \gg d^8 x^{\delta+c}$

for some fixed ${c>0}$. Let ${\psi_1,\psi_2,\psi_3}$ be smooth coefficient sequences at scales ${N_1,N_2,N_3}$ respectively. Then we have

$\displaystyle |\Delta(\psi_1 \ast \psi_2 \ast \psi_3; a)| \ll x^{-\epsilon} \frac{N_1 N_2 N_3}{d}$

for all ${a \in ({\bf Z}/d{\bf Z})^\times}$ and some fixed ${\epsilon>0}$.

Let us quickly see how Theorem 3 implies Theorem 2. To show (4), it suffices to establish the bound

$\displaystyle \sum_{n = a\ (q)} \alpha \ast \psi_1 \ast \psi_2 \ast \psi_3(n) = X + O( x^{-\epsilon} \frac{x}{q} )$

for all ${a \in ({\bf Z}/q{\bf Z})^\times}$, where ${X}$ denotes a quantity that is independent of ${a}$ (but can depend on other quantities such as ${\alpha,\psi_1,\psi_2,\psi_3,q}$). The left-hand side can be rewritten as

$\displaystyle \sum_{b \in ({\bf Z}/q{\bf Z})^\times} \sum_{m = b\ (q)} \alpha(m) \sum_{n = a/b\ (q)} \psi_1 \ast \psi_2 \ast \psi_3(n).$

From Theorem 3 we have

$\displaystyle \sum_{n = a/b\ (q)} \psi_1 \ast \psi_2 \ast \psi_3(n) = Y + O( x^{-\epsilon} \frac{N_1 N_2 N_3}{q} )$

where the quantity ${Y}$ does not depend on ${a}$ or ${b}$. Inserting this asymptotic and using crude bounds on ${\alpha}$ (see Lemma 8 of this previous post) we conclude (4) as required (after modifying ${\epsilon}$ slightly).

It remains to establish Theorem 3. This is done by a set of tools similar to that used to control the Type I and Type II sums:

• (i) completion of sums;
• (ii) the Weil conjectures and bounds on Ramanujan sums;
• (iii) factorisation of smooth moduli ${q \in {\mathcal S}_I}$;
• (iv) the Cauchy-Schwarz and triangle inequalities (Weyl differencing).

The specifics are slightly different though. For the Type I and Type II sums, it was the classical Weil bound on Kloosterman sums that were the key source of power saving; Ramanujan sums only played a minor role, controlling a secondary error term. For the Type III sums, one needs a significantly deeper consequence of the Weil conjectures, namely the estimate of Bombieri and Birch on a three-dimensional variant of a Kloosterman sum. Furthermore, the Ramanujan sums – which are a rare example of sums that actually exhibit better than square root cancellation, thus going beyond even what the Weil conjectures can offer – make a crucial appearance, when combined with the factorisation of the smooth modulus ${q}$ (this new argument is arguably the most original and interesting contribution of Zhang).

The following result is due independently to Furstenberg and to Sarkozy:

Theorem 1 (Furstenberg-Sarkozy theorem) Let ${\delta > 0}$, and suppose that ${N}$ is sufficiently large depending on ${\delta}$. Then every subset ${A}$ of ${[N] := \{1,\ldots,N\}}$ of density ${|A|/N}$ at least ${\delta}$ contains a pair ${n, n+r^2}$ for some natural numbers ${n, r}$ with ${r \neq 0}$.

This theorem is of course similar in spirit to results such as Roth’s theorem or Szemerédi’s theorem, in which the pattern ${n,n+r^2}$ is replaced by ${n,n+r,n+2r}$ or ${n,n+r,\ldots,n+(k-1)r}$ for some fixed ${k}$ respectively. There are by now many proofs of this theorem (see this recent paper of Lyall for a survey), but most proofs involve some form of Fourier analysis (or spectral theory). This may be compared with the standard proof of Roth’s theorem, which combines some Fourier analysis with what is now known as the density increment argument.

A few years ago, Ben Green, Tamar Ziegler, and myself observed that it is possible to prove the Furstenberg-Sarkozy theorem by just using the Cauchy-Schwarz inequality (or van der Corput lemma) and the density increment argument, removing all invocations of Fourier analysis, and instead relying on Cauchy-Schwarz to linearise the quadratic shift ${r^2}$. As such, this theorem can be considered as even more elementary than Roth’s theorem (and its proof can be viewed as a toy model for the proof of Roth’s theorem). We ended up not doing too much with this observation, so decided to share it here.

The first step is to use the density increment argument that goes back to Roth. For any ${\delta > 0}$, let ${P(\delta)}$ denote the assertion that for ${N}$ sufficiently large, all sets ${A \subset [N]}$ of density at least ${\delta}$ contain a pair ${n,n+r^2}$ with ${r}$ non-zero. Note that ${P(\delta)}$ is vacuously true for ${\delta > 1}$. We will show that for any ${0 < \delta_0 \leq 1}$, one has the implication

$\displaystyle P(\delta_0 + c \delta_0^3) \implies P(\delta_0) \ \ \ \ \ (1)$

for some absolute constant ${c>0}$. This implies that ${P(\delta)}$ is true for any ${\delta>0}$ (as can be seen by considering the infimum of all ${\delta>0}$ for which ${P(\delta)}$ holds), which gives Theorem 1.

It remains to establish the implication (1). Suppose for sake of contradiction that we can find ${0 < \delta_0 \leq 1}$ for which ${P(\delta_0+c\delta^3_0)}$ holds (for some sufficiently small absolute constant ${c>0}$), but ${P(\delta_0)}$ fails. Thus, we can find arbitrarily large ${N}$, and subsets ${A}$ of ${[N]}$ of density at least ${\delta_0}$, such that ${A}$ contains no patterns of the form ${n,n+r^2}$ with ${r}$ non-zero. In particular, we have

$\displaystyle \mathop{\bf E}_{n \in [N]} \mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{h \in [N^{1/100}]} 1_A(n) 1_A(n+(r+h)^2) = 0.$

(The exact ranges of ${r}$ and ${h}$ are not too important here, and could be replaced by various other small powers of ${N}$ if desired.)

Let ${\delta := |A|/N}$ be the density of ${A}$, so that ${\delta_0 \leq \delta \leq 1}$. Observe that

$\displaystyle \mathop{\bf E}_{n \in [N]} \mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{h \in [N^{1/100}]} 1_A(n) \delta 1_{[N]}(n+(r+h)^2) = \delta^2 + O(N^{-1/3})$

$\displaystyle \mathop{\bf E}_{n \in [N]} \mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{h \in [N^{1/100}]} \delta 1_{[N]}(n) \delta 1_{[N]}(n+(r+h)^2) = \delta^2 + O(N^{-1/3})$

and

$\displaystyle \mathop{\bf E}_{n \in [N]} \mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{h \in [N^{1/100}]} \delta 1_{[N]}(n) 1_A(n+(r+h)^2) = \delta^2 + O( N^{-1/3} ).$

If we thus set ${f := 1_A - \delta 1_{[N]}}$, then

$\displaystyle \mathop{\bf E}_{n \in [N]} \mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{h \in [N^{1/100}]} f(n) f(n+(r+h)^2) = -\delta^2 + O( N^{-1/3} ).$

In particular, for ${N}$ large enough,

$\displaystyle \mathop{\bf E}_{n \in [N]} |f(n)| \mathop{\bf E}_{r \in [N^{1/3}]} |\mathop{\bf E}_{h \in [N^{1/100}]} f(n+(r+h)^2)| \gg \delta^2.$

On the other hand, one easily sees that

$\displaystyle \mathop{\bf E}_{n \in [N]} |f(n)|^2 = O(\delta)$

and hence by the Cauchy-Schwarz inequality

$\displaystyle \mathop{\bf E}_{n \in [N]} \mathop{\bf E}_{r \in [N^{1/3}]} |\mathop{\bf E}_{h \in [N^{1/100}]} f(n+(r+h)^2)|^2 \gg \delta^3$

which we can rearrange as

$\displaystyle |\mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{h,h' \in [N^{1/100}]} \mathop{\bf E}_{n \in [N]} f(n+(r+h)^2) f(n+(r+h')^2)| \gg \delta^3.$

Shifting ${n}$ by ${(r+h)^2}$ we obtain (again for ${N}$ large enough)

$\displaystyle |\mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{h,h' \in [N^{1/100}]} \mathop{\bf E}_{n \in [N]} f(n) f(n+(h'-h)(2r+h'+h))| \gg \delta^3.$

In particular, by the pigeonhole principle (and deleting the diagonal case ${h=h'}$, which we can do for ${N}$ large enough) we can find distinct ${h,h' \in [N^{1/100}]}$ such that

$\displaystyle |\mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{n \in [N]} f(n) f(n+(h'-h)(2r+h'+h))| \gg \delta^3,$

so in particular

$\displaystyle \mathop{\bf E}_{n \in [N]} |\mathop{\bf E}_{r \in [N^{1/3}]} f(n+(h'-h)(2r+h'+h))| \gg \delta^3.$

If we set ${d := 2(h'-h)}$ and shift ${n}$ by ${(h'-h) (h'+h)}$, we can simplify this (again for ${N}$ large enough) as

$\displaystyle \mathop{\bf E}_{n \in [N]} |\mathop{\bf E}_{r \in [N^{1/3}]} f(n+dr)| \gg \delta^3. \ \ \ \ \ (2)$

On the other hand, since

$\displaystyle \mathop{\bf E}_{n \in [N]} f(n) = 0$

we have

$\displaystyle \mathop{\bf E}_{n \in [N]} f(n+dr) = O( N^{-2/3+1/100})$

for any ${r \in [N^{1/3}]}$, and thus

$\displaystyle \mathop{\bf E}_{n \in [N]} \mathop{\bf E}_{r \in [N^{1/3}]} f(n+dr) = O( N^{-2/3+1/100}).$

Averaging this with (2) we conclude that

$\displaystyle \mathop{\bf E}_{n \in [N]} \max( \mathop{\bf E}_{r \in [N^{1/3}]} f(n+dr), 0 ) \gg \delta^3.$

In particular, by the pigeonhole principle we can find ${n \in [N]}$ such that

$\displaystyle \mathop{\bf E}_{r \in [N^{1/3}]} f(n+dr) \gg \delta^3,$

or equivalently ${A}$ has density at least ${\delta+c'\delta^3}$ on the arithmetic progression ${\{ n+dr: r \in [N^{1/3}]\}}$, which has length ${\lfloor N^{1/3}\rfloor }$ and spacing ${d}$, for some absolute constant ${c'>0}$. By partitioning this progression into subprogressions of spacing ${d^2}$ and length ${\lfloor N^{1/4}\rfloor}$ (plus an error set of size ${O(N^{1/4})}$, we see from the pigeonhole principle that we can find a progression ${\{ n' + d^2 r': r' \in [N^{1/4}]\}}$ of length ${\lfloor N^{1/4}\rfloor}$ and spacing ${d^2}$ on which ${A}$ has density at least ${\delta + c\delta^3}$ (and hence at least ${\delta_0+c\delta_0^3}$) for some absolute constant ${c>0}$. If we then apply the induction hypothesis to the set

$\displaystyle A' := \{ r' \in [N^{1/4}]: n' + d^2 r' \in A \}$

we conclude (for ${N}$ large enough) that ${A'}$ contains a pair ${m, m+s^2}$ for some natural numbers ${m,s}$ with ${s}$ non-zero. This implies that ${(n'+d^2 m), (n'+d^2 m) + (|d|s)^2}$ lie in ${A}$, a contradiction, establishing the implication (1).

A more careful analysis of the above argument reveals a more quantitative version of Theorem 1: for ${N \geq 100}$ (say), any subset of ${[N]}$ of density at least ${C/(\log\log N)^{1/2}}$ for some sufficiently large absolute constant ${C}$ contains a pair ${n,n+r^2}$ with ${r}$ non-zero. This is not the best bound known; a (difficult) result of Pintz, Steiger, and Szemeredi allows the density to be as low as ${C / (\log N)^{\frac{1}{4} \log\log\log\log N}}$. On the other hand, this already improves on the (simpler) Fourier-analytic argument of Green that works for densities at least ${C/(\log\log N)^{1/11}}$ (although the original argument of Sarkozy, which is a little more intricate, works up to ${C (\log\log N)^{2/3}/(\log N)^{1/3}}$). In the other direction, a construction of Rusza gives a set of density ${\frac{1}{65} N^{-0.267}}$ without any pairs ${n,n+r^2}$.

Remark 1 A similar argument also applies with ${n,n+r^2}$ replaced by ${n,n+r^k}$ for fixed ${k}$, because this sort of pattern is preserved by affine dilations ${r' \mapsto n'+d^k r'}$ into arithmetic progressions whose spacing ${d^k}$ is a ${k^{th}}$ power. By re-introducing Fourier analysis, one can also perform an argument of this type for ${n,n+d,n+2d}$ where ${d}$ is the sum of two squares; see the above-mentioned paper of Green for details. However there seems to be some technical difficulty in extending it to patterns of the form ${n,n+P(r)}$ for polynomials ${P}$ that consist of more than a single monomial (and with the normalisation ${P(0)=0}$, to avoid local obstructions), because one no longer has this preservation property.

I’ve just uploaded to the arXiv my paper “Mixing for progressions in non-abelian groups“, submitted to Forum of Mathematics, Sigma (which, along with sister publication Forum of Mathematics, Pi, has just opened up its online submission system). This paper is loosely related in subject topic to my two previous papers on polynomial expansion and on recurrence in quasirandom groups (with Vitaly Bergelson), although the methods here are rather different from those in those two papers. The starting motivation for this paper was a question posed in this foundational paper of Tim Gowers on quasirandom groups. In that paper, Gowers showed (among other things) that if ${G}$ was a quasirandom group, patterns such as ${(x,xg,xh, xgh)}$ were mixing in the sense that, for any four sets ${A,B,C,D \subset G}$, the number of such quadruples ${(x,xg,xh, xgh)}$ in ${A \times B \times C \times D}$ was equal to ${(\mu(A) \mu(B) \mu(C) \mu(D) + o(1)) |G|^3}$, where ${\mu(A) := |A| / |G|}$, and ${o(1)}$ denotes a quantity that goes to zero as the quasirandomness of the group goes to infinity. In my recent paper with Vitaly, we also considered mixing properties of some other patterns, namely ${(x,xg,gx)}$ and ${(g,x,xg,gx)}$. This paper is concerned instead with the pattern ${(x,xg,xg^2)}$, that is to say a geometric progression of length three. As observed by Gowers, by applying (a suitably quantitative version of) Roth’s theorem in (cosets of) a cyclic group, one can obtain a recurrence theorem for this pattern without much effort: if ${G}$ is an arbitrary finite group, and ${A}$ is a subset of ${G}$ with ${\mu(A) \geq \delta}$, then there are at least ${c(\delta) |G|^2}$ pairs ${(x,g) \in G}$ such that ${x, xg, xg^2 \in A}$, where ${c(\delta)>0}$ is a quantity depending only on ${\delta}$. However, this argument does not settle the question of whether there is a stronger mixing property, in that the number of pairs ${(x,g) \in G^2}$ such that ${(x,xg,xg^2) \in A \times B \times C}$ should be ${(\mu(A)\mu(B)\mu(C)+o(1)) |G|^2}$ for any ${A,B,C \subset G}$. Informally, this would assert that for ${x, g}$ chosen uniformly at random from ${G}$, the triplet ${(x, xg, xg^2)}$ should resemble a uniformly selected element of ${G^3}$ in some weak sense.

For non-quasirandom groups, such mixing properties can certainly fail. For instance, if ${G}$ is the cyclic group ${G = ({\bf Z}/N{\bf Z},+)}$ (which is abelian and thus highly non-quasirandom) with the additive group operation, and ${A = \{1,\ldots,\lfloor \delta N\rfloor\}}$ for some small but fixed ${\delta > 0}$, then ${\mu(A) = \delta + o(1)}$ in the limit ${N \rightarrow \infty}$, but the number of pairs ${(x,g) \in G^2}$ with ${x, x+g, x+2g \in A}$ is ${(\delta^2/2 + o(1)) |G|^2}$ rather than ${(\delta^3+o(1)) |G|^2}$. The problem here is that the identity ${(x+2g) = 2(x+g) - x}$ ensures that if ${x}$ and ${x+g}$ both lie in ${A}$, then ${x+2g}$ has a highly elevated likelihood of also falling in ${A}$. One can view ${A}$ as the preimage of a small ball under the one-dimensional representation ${\rho: G \rightarrow U(1)}$ defined by ${\rho(n) := e^{2\pi i n/N}}$; similar obstructions to mixing can also be constructed from other low-dimensional representations.

However, by definition, quasirandom groups do not have low-dimensional representations, and Gowers asked whether mixing for ${(x,xg,xg^2)}$ could hold for quasirandom groups. I do not know if this is the case for arbitrary quasirandom groups, but I was able to settle the question for a specific class of quasirandom groups, namely the special linear groups ${G := SL_d(F)}$ over a finite field ${F}$ in the regime where the dimension ${d}$ is bounded (but is at least two) and ${F}$ is large. Indeed, for such groups I can obtain a count of ${(\mu(A) \mu(B) \mu(C) + O( |F|^{-\min(d-1,2)/8} )) |G|^2}$ for the number of pairs ${(x,g) \in G^2}$ with ${(x, xg, xg^2) \in A \times B \times C}$. In fact, I have the somewhat stronger statement that there are ${(\mu(A) \mu(B) \mu(C) \mu(D) + O( |F|^{-\min(d-1,2)/8} )) |G|^2}$ pairs ${(x,g) \in G^2}$ with ${(x,xg,xg^2,g) \in A \times B \times C \times D}$ for any ${A,B,C,D \subset G}$.

I was also able to obtain a partial result for the length four progression ${(x,xg,xg^2, xg^3)}$ in the simpler two-dimensional case ${G = SL_2(F)}$, but I had to make the unusual restriction that the group element ${g \in G}$ was hyperbolic in the sense that it was diagonalisable over the finite field ${F}$ (as opposed to diagonalisable over the algebraic closure ${\overline{F}}$ of that field); this amounts to the discriminant of the matrix being a quadratic residue, and this holds for approximately half of the elements of ${G}$. The result is then that for any ${A,B,C,D \subset G}$, one has ${(\frac{1}{2} \mu(A) \mu(B) \mu(C) \mu(D) + o(1)) |G|^2}$ pairs ${(x,g) \in G}$ with ${g}$ hyperbolic and ${(x,xg,xg^2,xg^3) \subset A \times B \times C \times D}$. (Again, I actually show a slightly stronger statement in which ${g}$ is restricted to an arbitrary subset ${E}$ of hyperbolic elements.)

For the length three argument, the main tools used are the Cauchy-Schwarz inequality, the quasirandomness of ${G}$, and some algebraic geometry to ensure that a certain family of probability measures on ${G}$ that are defined algebraically are approximately uniformly distributed. The length four argument is significantly more difficult and relies on a rather ad hoc argument involving, among other things, expander properties related to the work of Bourgain and Gamburd, and also a “twisted” version of an argument of Gowers that is used (among other things) to establish an inverse theorem for the ${U^3}$ norm.

I give some details of these arguments below the fold.

Title: Use basic examples to calibrate exponents

Motivation: In the more quantitative areas of mathematics, such as analysis and combinatorics, one has to frequently keep track of a large number of exponents in one’s identities, inequalities, and estimates.  For instance, if one is studying a set of N elements, then many expressions that one is faced with will often involve some power $N^p$ of N; if one is instead studying a function f on a measure space X, then perhaps it is an $L^p$ norm $\|f\|_{L^p(X)}$ which will appear instead.  The exponent $p$ involved will typically evolve slowly over the course of the argument, as various algebraic or analytic manipulations are applied.  In some cases, the exact value of this exponent is immaterial, but at other times it is crucial to have the correct value of $p$ at hand.   One can (and should) of course carefully go through one’s arguments line by line to work out the exponents correctly, but it is all too easy to make a sign error or other mis-step at one of the lines, causing all the exponents on subsequent lines to be incorrect.  However, one can guard against this (and avoid some tedious line-by-line exponent checking) by continually calibrating these exponents at key junctures of the arguments by using basic examples of the object of study (sets, functions, graphs, etc.) as test cases.  This is a simple trick, but it lets one avoid many unforced errors with exponents, and also lets one compute more rapidly.

Quick description: When trying to quickly work out what an exponent p in an estimate, identity, or inequality should be without deriving that statement line-by-line, test that statement with a simple example which has non-trivial behaviour with respect to that exponent p, but trivial behaviour with respect to as many other components of that statement as one is able to manage.   The “non-trivial” behaviour should be parametrised by some very large or very small parameter.  By matching the dependence on this parameter on both sides of the estimate, identity, or inequality, one should recover p (or at least a good prediction as to what p should be).

General discussion: The test examples should be as basic as possible; ideally they should have trivial behaviour in all aspects except for one feature that relates to the exponent p that one is trying to calibrate, thus being only “barely” non-trivial.   When the object of study is a function, then (appropriately rescaled, or otherwise modified) bump functions are very typical test objects, as are Dirac masses, constant functions, Gaussians, or other functions that are simple and easy to compute with.  In additive combinatorics, when the object of study is a subset of a group, then subgroups, arithmetic progressions, or random sets are typical test objects.  In graph theory, typical examples of test objects include complete graphs, complete bipartite graphs, and random graphs. And so forth.

This trick is closely related to that of using dimensional analysis to recover exponents; indeed, one can view dimensional analysis as the special case of exponent calibration when using test objects which are non-trivial in one dimensional aspect (e.g. they exist at a single very large or very small length scale) but are otherwise of a trivial or “featureless” nature.   But the calibration trick is more general, as it can involve parameters (such as probabilities, angles, or eccentricities) which are not commonly associated with the physical concept of a dimension.  And personally, I find example-based calibration to be a much more satisfying (and convincing) explanation of an exponent than a calibration arising from formal dimensional analysis.

When one is trying to calibrate an inequality or estimate, one should try to pick a basic example which one expects to saturate that inequality or estimate, i.e. an example for which the inequality is close to being an equality.  Otherwise, one would only expect to obtain some partial information on the desired exponent p (e.g. a lower bound or an upper bound only).  Knowing the examples that saturate an estimate that one is trying to prove is also useful for several other reasons – for instance, it strongly suggests that any technique which is not efficient when applied to the saturating example, is unlikely to be strong enough to prove the estimate in general, thus eliminating fruitless approaches to a problem and (hopefully) refocusing one’s attention on those strategies which actually have a chance of working.

Calibration is best used for the type of quick-and-dirty calculations one uses when trying to rapidly map out an argument that one has roughly worked out already, but without precise details; in particular, I find it particularly useful when writing up a rapid prototype.  When the time comes to write out the paper in full detail, then of course one should instead carefully work things out line by line, but if all goes well, the exponents obtained in that process should match up with the preliminary guesses for those exponents obtained by calibration, which adds confidence that there are no exponent errors have been committed.

This week I was in Columbus, Ohio, attending a conference on equidistribution on manifolds. I talked about my recent paper with Ben Green on the quantitative behaviour of polynomial sequences in nilmanifolds, which I have blogged about previously. During my talk (and inspired by the immediately preceding talk of Vitaly Bergelson), I stated explicitly for the first time a generalisation of the van der Corput trick which morally underlies our paper, though it is somewhat buried there as we specialised it to our application at hand (and also had to deal with various quantitative issues that made the presentation more complicated). After the talk, several people asked me for a more precise statement of this trick, so I am presenting it here, and as an application reproving an old theorem of Leon Green that gives a necessary and sufficient condition as to whether a linear sequence $(g^n x)_{n=1}^\infty$ on a nilmanifold $G/\Gamma$ is equidistributed, which generalises the famous theorem of Weyl on equidistribution of polynomials.
We shall see that for weakly mixing systems, averages such as $\frac{1}{N} \sum_{n=0}^{N-1} T^n f \ldots T^{(k-1)n} f$ can be computed very explicitly (in fact, this average converges to the constant $(\int_X f\ d\mu)^{k-1}$). More generally, we shall see that weakly mixing components of a system tend to average themselves out and thus become irrelevant when studying many types of ergodic averages. Our main tool here will be the humble Cauchy-Schwarz inequality, and in particular a certain consequence of it, known as the van der Corput lemma.