You are currently browsing the monthly archive for April 2010.

A friend of mine recently asked me for some suggestions for games or other activities for children that would help promote quantitative reasoning or mathematical skills, while remaining fun to play (i.e. more than just homework-type questions poorly disguised in game form).    The initial question was focused on computer games (and specifically, on iPhone apps), but I think the broader question would also be of interest.

I myself have not seriously played these sorts of games for years, so I could only come up with a few examples immediately: the game “Planarity“, and the game “Factory Balls” (and two sequels).   (Edit: Rubik’s cube and its countless cousins presumably qualify also, due to their implicit use of group theory.)  I am hopeful though that readers may be able to come up with more suggestions.

There is of course no shortage of “educational” games, computer-based or otherwise, available, but I think what I (and my friend) would be looking for here are games with production values comparable to other, less educational games, and for which the need for mathematical thinking arises naturally in the gameplay rather than being artificially inserted by fiat (e.g. “solve this equation to proceed”).  (Here I interpret “mathematical thinking” loosely, to include not just numerical or algebraic thinking, but also geometric, abstract, logical, probabilistic, etc.)

[Question for MathOverflow experts: would this type of question be suitable for crossposting there?   The requirement that such questions be “research-level” seems to suggest not.]

In the previous lecture notes, we used (linear) Fourier analysis to control the number of three-term arithmetic progressions ${a, a+r, a+2r}$ in a given set ${A}$. The power of the Fourier transform for this problem ultimately stemmed from the identity

$\displaystyle \mathop{\bf E}_{n,r \in {\bf Z}/N'{\bf Z}} 1_A(n) 1_A(n+r) 1_A(n+2r)$

$\displaystyle = \sum_{\alpha \in \frac{1}{N'}{\bf Z} / {\bf Z}} \hat 1_A(\alpha) \hat 1_A(-2\alpha) \hat 1_A(\alpha) \ \ \ \ \ (1)$

for any cyclic group ${{\bf Z}/N'{\bf Z}}$ and any subset ${A}$ of that group (analogues of this identity also exist for other finite abelian groups, and to a lesser extent to non-abelian groups also, although that is not the focus of my current discussion). As it turns out, linear Fourier analysis is not able to discern higher order patterns, such as arithmetic progressions of length four; we give some demonstrations of this below the fold, taking advantage of the polynomial recurrence theory from Notes 1.

The main objective of this course is to introduce the (still nascent) theory of higher order Fourier analysis, which is capable of studying higher order patterns. The full theory is still rather complicated (at least, at our present level of understanding). However, one aspect of the theory is relatively simple, namely that we can largely reduce the study of arbitrary additive patterns to the study of a single type of additive pattern, namely the parallelopipeds

$\displaystyle ( x + \omega_1 h_1 + \ldots + \omega_d h_d )_{\omega_1,\ldots,\omega_d \in \{0,1\}}. \ \ \ \ \ (2)$

Thus for instance, for ${d=1}$ one has the line segments

$\displaystyle x, x+h_1 \ \ \ \ \ (3)$

for ${d=2}$ one has the parallelograms

$\displaystyle x, x+h_1, x+h_2, x+h_1+h_2, \ \ \ \ \ (4)$

for ${d=3}$ one has the parallelopipeds

$\displaystyle x, x+h_1, x+h_2, x+h_3, x+h_1+h_2, x+h_1+h_3, x+h_2+h_3, x+h_1+h_2+h_3. \ \ \ \ \ (5)$

These patterns are particularly pleasant to handle, thanks to the large number of symmetries available on the discrete cube ${\{0,1\}^d}$. For instance, whereas establishing the presence of arbitrarily long arithmetic progressions in dense sets is quite difficult (Szemerédi’s theorem), establishing arbitrarily high-dimensional parallelopipeds is much easier:

Exercise 1 Let ${A \subset [N]}$ be such that ${|A| > \delta N}$ for some ${0 < \delta \leq 1}$. If ${N}$ is sufficiently large depending on ${\delta}$, show that there exists an integer ${1 \leq h \ll 1/\delta}$ such that ${|A \cap (A+h)| \gg \delta^2 N}$. (Hint: obtain upper and lower bounds on the set ${\{ (x,y) \in A \times A: x < y \leq x + 10/\delta \}}$.)

Exercise 2 (Hilbert cube lemma) Let ${A \subset [N]}$ be such that ${|A| > \delta N}$ for some ${0 < \delta \leq 1}$, and let ${d \geq 1}$ be an integer. Show that if ${N}$ is sufficiently large depending on ${\delta,d}$, then ${A}$ contains a parallelopiped of the form (2), with ${1 \leq h_1,\ldots,h_d \ll_\delta 1}$ positive integers. (Hint: use the previous exercise and induction.) Conclude that if ${A \subset {\bf Z}}$ has positive upper density, then it contains infinitely many such parallelopipeds for each ${d}$.

Exercise 3 Show that if ${q \geq 1}$ is an integer, and ${d}$ is sufficiently large depending on ${q}$, then for any parallelopiped (2) in the integers ${{\bf Z}}$, there exists ${\omega_1,\ldots,\omega_d \in \{0,1\}}$, not all zero, such that ${x + h_1 \omega_1 + \ldots + h_d \omega_d = x \hbox{ mod } q}$. (Hint: pigeonhole the ${h_i}$ in the residue classes modulo ${q}$.) Use this to conclude that if ${A}$ is the set of all integers ${n}$ such that ${|n-km!| \geq m}$ for all integers ${k, m \geq 1}$, then ${A}$ is a set of positive upper density (and also positive lower density) which does not contain any infinite parallelopipeds (thus one cannot take ${d=\infty}$ in the Hilbert cube lemma).

The standard way to control the parallelogram patterns (and thus, all other (finite complexity) linear patterns) are the Gowers uniformity norms

$\displaystyle \| f\|_{U^d(G)} := {\bf E}_{x,h_1,\ldots,h_d \in G} \prod_{\omega_1,\ldots,\omega_d \in \{0,1\}^d} {\mathcal C}^{\omega_1+\ldots+\omega_d} f(x+\omega_1 h_1 + \ldots + \omega_d h_d) \ \ \ \ \ (6)$

with ${f: G \rightarrow {\bf C}}$ a function on a finite abelian group ${G}$, and ${{\mathcal C}: z \mapsto \overline{z}}$ is the complex conjugation operator; analogues of this norm also exist for group-like objects such as the progression ${[N]}$, and also for measure-preserving systems (where they are known as the Gowers-Host-Kra uniformity seminorms, see this paper of Host-Kra for more discussion). In this set of notes we will focus on the basic properties of these norms; the deepest fact about them, known as the inverse conjecture for these norms, will be discussed in later notes.

Ordinarily, I only mention my research papers on this blog when they are first submitted, or if a major update is required.  With the paper arising from the DHJ Polymath “Low Dimensions” project, though, the situation is a little different as the collaboration to produce the paper took place on this blog.

Anyway, the good news is that the paper has been accepted for the Szemerédi birthday conference proceedings.  The referee was positive and recommended only some minor changes (I include the list of changes below the fold).  I have incorporated these changes, and the new version of the paper can be found here.  Within a few days I need to return the paper to the editor, so this is the last chance to propose any further corrections or changes (though at this stage any major changes are unlikely to be feasible).

The editor asked a good question: should we have a list of participants for this project somewhere?  If so, it seems to make more sense to have this list as a link on the wiki, rather than in the paper itself. But making a list opens the can of worms of deciding what level of participation should be the threshold for inclusion in the list – should someone who only contributed, say, one tangential comment to one of the blog posts be listed alongside a substantially more active participant?

One possibility is that of self-reporting; we could set up a page for participants on the wiki and let anyone who felt like they contributed add their name, and rely on the honour code to keep it accurate.    This might be feasible as long as the page is kept unofficial (so, in particular, it will not be used as the formal list of authors for the paper).

A related question is whether to add an explicit link to the timeline for progress on this project and on the sister “New proof” project.  If so, this should also be kept unofficial (there was no formal guidelines as to what was included in the timeline and what was not).

These decisions do not necessarily have to be taken quickly; one can simply point to the wiki in the paper (as is already done in the current version), and update the wiki later if we decide to add these sorts of acknowledgments on that site.

Incidentally, if we have another successful Polymath project to write up, I would now strongly recommend using version control software (such as Subversion or git) to organise the writing process, both at the informal notes stage and also at the drafting stage.  It is certainly far superior to our improvised solution of putting the raw TeX files on a wiki…

It’s been a while since I’ve added to my career advice and writing pages on this blog, but I recently took the time to write up another such page on a topic I had not previously covered, entitled “Write in your own voice“.  The main point here is that while every piece of mathematical research inevitably builds upon the previous literature, one should not mimic the style and text of that literature slavishly, but instead develop one’s own individual style, while also updating and adapting the results and insights of previous authors.

In topology, a non-empty set ${E}$ is said to be connected if cannot be decomposed into two nontrivial subsets that are both closed and open relative to ${E}$, and path connected if any two points ${x,y}$ in ${E}$ can be connected by a path (i.e. there exists a continuous map ${\gamma: [0,1] \rightarrow E}$ with ${\gamma(0)=x}$ and ${\gamma(1)=y}$).

Path-connected sets are always connected, but the converse is not true, even in the model case of compact subsets of a Euclidean space. The classic counterexample is the set

$\displaystyle E := \{ (0,y): -1 \leq y \leq 1 \} \cup \{ (x, \sin(1/x)): 0 < x \leq 1 \}, \ \ \ \ \ (1)$

which is connected but not path-connected (there is no continuous path from ${(0,1)}$ to ${(1,\sin(1))}$).

Looking at the definitions of the two concepts, one notices a difference: the notion of path-connectedness is somehow a “positive” one, in the sense that a path-connected set can produce the existence of something (a path connecting two points ${x}$ and ${y}$) for a given type of input (in this case, a pair of points ${x, y}$). On the other hand, the notion of connectedness is a “negative” one, in that it asserts the non-existence of something (a non-trivial partition into clopen sets). To put it another way, it is relative easy to convince someone that a set is path-connected (by providing a connecting path for every pair of points) or is disconnected (by providing a non-trivial partition into clopen sets) but if a set is not path-connected, or is connected, how can one easily convince someone of this fact? To put it yet another way: is there a reasonable certificate for connectedness (or for path-disconnectedness)?

In the case of connectedness for compact subsets ${E}$ of Euclidean space, there is an answer as follows. If ${\epsilon > 0}$, let us call two points ${x, y}$ in ${E}$ ${\epsilon}$-connected if one can find a finite sequence ${x_0 = x, x_1, \ldots, x_N = y}$ of points in ${E}$, such that ${|x_{i+1}-x_i| < \epsilon}$ for all ${0 \leq i < N}$; informally, one can jump from ${x}$ to ${y}$ in ${E}$ using jumps of length at most ${\epsilon}$. Let us call ${x_0,\ldots,x_N}$ an ${\epsilon}$-discrete path.

Proposition 1 (Connectedness certificate for compact subsets of Euclidean space) Let ${E \subset {\bf R}^d}$ be compact and non-empty. Then ${E}$ is connected if and only if every pair of points in ${E}$ is ${\epsilon}$-connected for every ${\epsilon > 0}$.

Proof: Suppose first that ${E}$ is disconnected, then ${E}$ can be partitioned into two non-empty closed subsets ${F, G}$. Since ${E}$ is compact, ${F, G}$ are compact also, and so they are separated by some non-zero distance ${\epsilon > 0}$. But then it is clear that points in ${F}$ cannot be ${\epsilon}$-connected to points in ${G}$, and the claim follows.

Conversely, suppose that there is a pair of points ${x,y}$ in ${E}$ and an ${\epsilon > 0}$ such that ${x,y}$ are not ${\epsilon}$-connected. Let ${F}$ be the set of all points in ${E}$ that are ${\epsilon}$-connected to ${x}$. It is easy to check that ${F}$ is open, closed, and a proper subset of ${E}$; thus ${E}$ is disconnected. $\Box$

We remark that the above proposition in fact works for any compact metric space. It is instructive to see how the points ${(1,\sin(1))}$ and ${(0,1)}$ are ${\epsilon}$-connected in the set (1); the ${\epsilon}$-discrete path follows the graph of ${\sin(1/x)}$ backwards until one gets sufficiently close to the ${y}$-axis, at which point one “jumps” across to the ${y}$-axis to eventually reach ${(0,1)}$.

It is also interesting to contrast the above proposition with path connectedness. Clearly, if two points ${x, y}$ are connected by a path, then they are ${\epsilon}$-connected for every ${\epsilon > 0}$ (because every continuous map ${\gamma: [0,1] \rightarrow E}$ is uniformly continuous); but from the analysis of the example (1) we see that the converse is not true. Roughly speaking, the various ${\epsilon}$-discrete paths from ${x}$ to ${y}$ have to be “compatible” with each other in some sense in order to synthesise a continuous path from ${x}$ to ${y}$ in the limit (we will not make this precise here).

But this leaves two (somewhat imprecise) questions, which I do not know how to satisfactorily answer:

Question 1: Is there a good certificate for path disconnectedness, say for compact subsets ${E}$ of ${{\bf R}^d}$? One can construct lousy certificates, for instance one could look at all continuous paths in ${{\bf R}^d}$ joining two particular points ${x, y}$ in ${E}$, and verify that each one of them leaves ${E}$ at some point. But this is an “uncountable” certificate – it requires one to check an uncountable number of paths. In contrast, the certificate in Proposition 1 is basically a countable one (if one describes a compact set ${E}$ by describing a family of ${\epsilon}$-nets for a countable sequence of ${\epsilon}$ tending to zero). (Very roughly speaking, I would like a certificate that can somehow be “verified in countable time” in a suitable oracle model, as discussed in my previous post, though I have not tried to make this vague specification more rigorous.)

It is tempting to look at the equivalence classes of ${E}$ given by the relation of being connected by a path, but these classes need not be closed (as one can see with the example (1)) and it is not obvious to me how to certify that two such classes are not path-connected to each other.

Question 2: Is there a good certificate for connectedness for closed but unbounded closed subsets of ${{\bf R}^d}$? Proposition 1 fails in this case; consider for instance the set

$\displaystyle E := \{ (x,0): x \in {\bf R} \} \cup \{ (x,\frac{1}{x}): x > 0 \}. \ \ \ \ \ (2)$

Any pair of points ${x,y \in E}$ is ${\epsilon}$-connected for every ${\epsilon > 0}$, and yet this set is disconnected.

The problem here is that as ${\epsilon}$ gets smaller, the ${\epsilon}$-discrete paths connecting a pair of points such as ${(1,0)}$ and ${(1,1)}$ have diameter going to infinity. One natural guess is then to require a uniform bound on the diameter, i.e. that for any pair of points ${x, y}$, there exists an ${R>0}$ such that there is an ${\epsilon}$-discrete path from ${x}$ to ${y}$ of diameter at most ${R}$ for every ${\epsilon > 0}$. This does indeed force connectedness, but unfortunately not all connected sets have this property. Consider for instance the set

$\displaystyle E := \{ (x,y,0): x \in {\bf R}, y \in \pm 1 \} \cup \bigcup_{n=1}^\infty (E_n \cup F_n) \ \ \ \ \ (3)$

in ${{\bf R}^3}$, where

$\displaystyle E_n := \{ (x,y,0): \frac{x^2}{n^2} + \frac{y^2}{(1-1/n)^2} = 1\}$

is a rectangular ellipse centered at the origin with minor diameter endpoints ${(0,1/n-1), (0,1-1/n)}$ and major diameter endpoints ${(-n,0), (n,0)}$, and

$\displaystyle F_n := \{ (n,y,z): (y-1/2)^2+z^2=1/4 \}$

is a circle that connects the ${(n,0)}$ endpoint of ${E_n}$ to the point ${(n,1)}$ in ${E}$. One can check that ${E}$ is a closed connected set, but the ${\epsilon}$-discrete paths connecting ${(0,-1)}$ with ${(0,+1)}$ have unbounded diameter as ${\epsilon \rightarrow 0}$.

Currently, I do not have any real progress on Question 1. For Question 2, I can only obtain the following strange “second-order” criterion for connectedness, that involves an unspecified gauge function ${\delta}$:

Proposition 2 (Second-order connectedness certificate) Let ${E}$ be a closed non-empty subset of ${{\bf R}^d}$. Then the following are equivalent:

• ${E}$ is connected.
• For every monotone decreasing, strictly positive function ${\delta: {\bf R}^+ \rightarrow {\bf R}^+}$ and every ${x,y \in E}$, there exists a discrete path ${x_0=x,x_1,\ldots,x_N=y}$ in ${E}$ such that ${|x_{i+1}-x_i| < \delta(|x_i|)}$.

Proof: This is proven in almost the same way as Proposition 1. If ${E}$ can be disconnected into two non-trivial sets ${F, G}$, then one can find a monotone decreasing gauge function ${\delta: {\bf R}^+ \rightarrow {\bf R}^+}$ such that for each ball ${B_R := \{ x \in {\bf R}^d: |x| \leq R \}}$, ${F \cap B_R}$ and ${G}$ are separated by at least ${\delta(R)}$, and then there is no discrete path from ${F}$ to ${G}$ in ${E}$ obeying the condition ${|x_{i+1}-x_i| < \delta(|x_i|)}$.

Conversely, if there exists a gauge function ${\delta}$ and two points ${x,y \in E}$ which cannot be connected by a discrete path in ${E}$ that obeys the condition ${|x_{i+1}-x_i| < \delta(|x_i|)}$, then if one sets ${F}$ to be all the points that can be reached from ${x}$ in this manner, one easily verifies that ${F}$ and ${E \backslash F}$ disconnect ${E}$. $\Box$

It may be that this is somehow the “best” one can do, but I am not sure how to quantify this formally.

Anyway, I was curious if any of the readers here (particularly those with expertise in point-set topology or descriptive set theory) might be able to shed more light on these questions. (I also considered crossposting this to Math Overflow, but I think the question may be a bit too long (and vague) for that.)

(The original motivation for this question, by the way, stems from an attempt to use methods of topological group theory to attack questions in additive combinatorics, in the spirit of the paper of Hrushovski studied previously on this blog. The connection is rather indirect, though; I may discuss this more in a future post.)

The Riemann zeta function ${\zeta(s)}$ is defined in the region ${\hbox{Re}(s)>1}$ by the absolutely convergent series

$\displaystyle \zeta(s) = \sum_{n=1}^\infty \frac{1}{n^s} = 1 + \frac{1}{2^s} + \frac{1}{3^s} + \ldots. \ \ \ \ \ (1)$

Thus, for instance, it is known that ${\zeta(2)=\pi^2/6}$, and thus

$\displaystyle \sum_{n=1}^\infty \frac{1}{n^2} = 1 + \frac{1}{4} + \frac{1}{9} + \ldots = \frac{\pi^2}{6}. \ \ \ \ \ (2)$

For ${\hbox{Re}(s) \leq 1}$, the series on the right-hand side of (1) is no longer absolutely convergent, or even conditionally convergent. Nevertheless, the ${\zeta}$ function can be extended to this region (with a pole at ${s=1}$) by analytic continuation. For instance, it can be shown that after analytic continuation, one has ${\zeta(0) = -1/2}$, ${\zeta(-1) = -1/12}$, and ${\zeta(-2)=0}$, and more generally

$\displaystyle \zeta(-s) = - \frac{B_{s+1}}{s+1} \ \ \ \ \ (3)$

for ${s=1,2,\ldots}$, where ${B_n}$ are the Bernoulli numbers. If one formally applies (1) at these values of ${s}$, one obtains the somewhat bizarre formulae

$\displaystyle \sum_{n=1}^\infty 1 = 1 + 1 + 1 + \ldots = -1/2 \ \ \ \ \ (4)$

$\displaystyle \sum_{n=1}^\infty n = 1 + 2 + 3 + \ldots = -1/12 \ \ \ \ \ (5)$

$\displaystyle \sum_{n=1}^\infty n^2 = 1 + 4 + 9 + \ldots = 0 \ \ \ \ \ (6)$

and

$\displaystyle \sum_{n=1}^\infty n^s = 1 + 2^s + 3^s + \ldots = -\frac{B_{s+1}}{s+1}. \ \ \ \ \ (7)$

Clearly, these formulae do not make sense if one stays within the traditional way to evaluate infinite series, and so it seems that one is forced to use the somewhat unintuitive analytic continuation interpretation of such sums to make these formulae rigorous. But as it stands, the formulae look “wrong” for several reasons. Most obviously, the summands on the left are all positive, but the right-hand sides can be zero or negative. A little more subtly, the identities do not appear to be consistent with each other. For instance, if one adds (4) to (5), one obtains

$\displaystyle \sum_{n=1}^\infty (n+1) = 2 + 3 + 4 + \ldots = -7/12 \ \ \ \ \ (8)$

whereas if one subtracts ${1}$ from (5) one obtains instead

$\displaystyle \sum_{n=2}^\infty n = 0 + 2 + 3 + 4 + \ldots = -13/12 \ \ \ \ \ (9)$

and the two equations seem inconsistent with each other.

However, it is possible to interpret (4), (5), (6) by purely real-variable methods, without recourse to complex analysis methods such as analytic continuation, thus giving an “elementary” interpretation of these sums that only requires undergraduate calculus; we will later also explain how this interpretation deals with the apparent inconsistencies pointed out above.

To see this, let us first consider a convergent sum such as (2). The classical interpretation of this formula is the assertion that the partial sums

$\displaystyle \sum_{n=1}^N \frac{1}{n^2} = 1 + \frac{1}{4} + \frac{1}{9} + \ldots + \frac{1}{N^2}$

converge to ${\pi^2/6}$ as ${N \rightarrow \infty}$, or in other words that

$\displaystyle \sum_{n=1}^N \frac{1}{n^2} = \frac{\pi^2}{6} + o(1)$

where ${o(1)}$ denotes a quantity that goes to zero as ${N \rightarrow \infty}$. Actually, by using the integral test estimate

$\displaystyle \sum_{n=N+1}^\infty \frac{1}{n^2} \leq \int_N^\infty \frac{dx}{x^2} = \frac{1}{N}$

we have the sharper result

$\displaystyle \sum_{n=1}^N \frac{1}{n^2} = \frac{\pi^2}{6} + O(\frac{1}{N}).$

Thus we can view ${\frac{\pi^2}{6}}$ as the leading coefficient of the asymptotic expansion of the partial sums of ${\sum_{n=1}^\infty 1/n^2}$.

One can then try to inspect the partial sums of the expressions in (4), (5), (6), but the coefficients bear no obvious relationship to the right-hand sides:

$\displaystyle \sum_{n=1}^N 1 = N$

$\displaystyle \sum_{n=1}^N n = \frac{1}{2} N^2 + \frac{1}{2} N$

$\displaystyle \sum_{n=1}^N n^2 = \frac{1}{3} N^3 + \frac{1}{2} N^2 + \frac{1}{6} N.$

For (7), the classical Faulhaber formula (or Bernoulli formula) gives

$\displaystyle \sum_{n=1}^N n^s = \frac{1}{s+1} \sum_{j=0}^s \binom{s+1}{j} B_j N^{s+1-j} \ \ \ \ \ (10)$

$\displaystyle = \frac{1}{s+1} N^{s+1} + \frac{1}{2} N^s + \frac{s}{12} N^{s-1} + \ldots + B_s N$

for ${s \geq 2}$, which has a vague resemblance to (7), but again the connection is not particularly clear.

The problem here is the discrete nature of the partial sum

$\displaystyle \sum_{n=1}^N n^s = \sum_{n \leq N} n^s,$

which (if ${N}$ is viewed as a real number) has jump discontinuities at each positive integer value of ${N}$. These discontinuities yield various artefacts when trying to approximate this sum by a polynomial in ${N}$. (These artefacts also occur in (2), but happen in that case to be obscured in the error term ${O(1/N)}$; but for the divergent sums (4), (5), (6), (7), they are large enough to cause real trouble.)

However, these issues can be resolved by replacing the abruptly truncated partial sums ${\sum_{n=1}^N n^s}$ with smoothed sums ${\sum_{n=1}^\infty \eta(n/N) n^s}$, where ${\eta: {\bf R}^+ \rightarrow {\bf R}}$ is a cutoff function, or more precisely a compactly supported bounded function that equals ${1}$ at ${0}$. The case when ${\eta}$ is the indicator function ${1_{[0,1]}}$ then corresponds to the traditional partial sums, with all the attendant discretisation artefacts; but if one chooses a smoother cutoff, then these artefacts begin to disappear (or at least become lower order), and the true asymptotic expansion becomes more manifest.

Note that smoothing does not affect the asymptotic value of sums that were already absolutely convergent, thanks to the dominated convergence theorem. For instance, we have

$\displaystyle \sum_{n=1}^\infty \eta(n/N) \frac{1}{n^2} = \frac{\pi^2}{6} + o(1)$

whenever ${\eta}$ is a cutoff function (since ${\eta(n/N) \rightarrow 1}$ pointwise as ${N \rightarrow \infty}$ and is uniformly bounded). If ${\eta}$ is equal to ${1}$ on a neighbourhood of the origin, then the integral test argument then recovers the ${O(1/N)}$ decay rate:

$\displaystyle \sum_{n=1}^\infty \eta(n/N) \frac{1}{n^2} = \frac{\pi^2}{6} + O(\frac{1}{N}).$

However, smoothing can greatly improve the convergence properties of a divergent sum. The simplest example is Grandi’s series

$\displaystyle \sum_{n=1}^\infty (-1)^{n-1} = 1 - 1 + 1 - \ldots.$

The partial sums

$\displaystyle \sum_{n=1}^N (-1)^{n-1} = \frac{1}{2} + \frac{1}{2} (-1)^{N-1}$

oscillate between ${1}$ and ${0}$, and so this series is not conditionally convergent (and certainly not absolutely convergent). However, if one performs analytic continuation on the series

$\displaystyle \sum_{n=1}^\infty \frac{(-1)^{n-1}}{n^s} = 1 - \frac{1}{2^s} + \frac{1}{3^s} - \ldots$

and sets ${s = 0}$, one obtains a formal value of ${1/2}$ for this series. This value can also be obtained by smooth summation. Indeed, for any cutoff function ${\eta}$, we can regroup

$\displaystyle \sum_{n=1}^\infty (-1)^{n-1} \eta(n/N) =$

$\displaystyle \frac{\eta(1/N)}{2} + \sum_{m=1}^\infty \frac{\eta((2m-1)/N) - 2\eta(2m/N) + \eta((2m+1)/N)}{2}.$

If ${\eta}$ is twice continuously differentiable (i.e. ${\eta \in C^2}$), then from Taylor expansion we see that the summand has size ${O(1/N^2)}$, and also (from the compact support of ${\eta}$) is only non-zero when ${m=O(N)}$. This leads to the asymptotic

$\displaystyle \sum_{n=1}^\infty (-1)^{n-1} \eta(n/N) = \frac{1}{2} + O( \frac{1}{N} )$

and so we recover the value of ${1/2}$ as the leading term of the asymptotic expansion.

Exercise 1 Show that if ${\eta}$ is merely once continuously differentiable (i.e. ${\eta \in C^1}$), then we have a similar asymptotic, but with an error term of ${o(1)}$ instead of ${O(1/N)}$. This is an instance of a more general principle that smoother cutoffs lead to better error terms, though the improvement sometimes stops after some degree of regularity.

Remark 1 The most famous instance of smoothed summation is Cesáro summation, which corresponds to the cutoff function ${\eta(x) := (1-x)_+}$. Unsurprisingly, when Cesáro summation is applied to Grandi’s series, one again recovers the value of ${1/2}$.

If we now revisit the divergent series (4), (5), (6), (7) with smooth summation in mind, we finally begin to see the origin of the right-hand sides. Indeed, for any fixed smooth cutoff function ${\eta}$, we will shortly show that

$\displaystyle \sum_{n=1}^\infty \eta(n/N) = -\frac{1}{2} + C_{\eta,0} N + O(\frac{1}{N}) \ \ \ \ \ (11)$

$\displaystyle \sum_{n=1}^\infty n \eta(n/N) = -\frac{1}{12} + C_{\eta,1} N^2 + O(\frac{1}{N}) \ \ \ \ \ (12)$

$\displaystyle \sum_{n=1}^\infty n^2 \eta(n/N) = C_{\eta,2} N^3 + O(\frac{1}{N}) \ \ \ \ \ (13)$

and more generally

$\displaystyle \sum_{n=1}^\infty n^s \eta(n/N) = -\frac{B_{s+1}}{s+1} + C_{\eta,s} N^{s+1} + O(\frac{1}{N}) \ \ \ \ \ (14)$

for any fixed ${s=1,2,3,\ldots}$ where ${C_{\eta,s}}$ is the Archimedean factor

$\displaystyle C_{\eta,s} := \int_0^\infty x^s \eta(x)\ dx \ \ \ \ \ (15)$

(which is also essentially the Mellin transform of ${\eta}$). Thus we see that the values (4), (5), (6), (7) obtained by analytic continuation are nothing more than the constant terms of the asymptotic expansion of the smoothed partial sums. This is not a coincidence; we will explain the equivalence of these two interpretations of such sums (in the model case when the analytic continuation has only finitely many poles and does not grow too fast at infinity) below the fold.

This interpretation clears up the apparent inconsistencies alluded to earlier. For instance, the sum ${\sum_{n=1}^\infty n = 1 + 2 + 3 + \ldots}$ consists only of non-negative terms, as does its smoothed partial sums ${\sum_{n=1}^\infty n \eta(n/N)}$ (if ${\eta}$ is non-negative). Comparing this with (12), we see that this forces the highest-order term ${C_{\eta,1} N^2}$ to be non-negative (as indeed it is), but does not prohibit the lower-order constant term ${-\frac{1}{12}}$ from being negative (which of course it is).

Similarly, if we add together (12) and (11) we obtain

$\displaystyle \sum_{n=1}^\infty (n+1) \eta(n/N) = -\frac{7}{12} + C_{\eta,1} N^2 + C_{\eta,0} N + O(\frac{1}{N}) \ \ \ \ \ (16)$

while if we subtract ${1}$ from (12) we obtain

$\displaystyle \sum_{n=2}^\infty n \eta(n/N) = -\frac{13}{12} + C_{\eta,1} N^2 + O(\frac{1}{N}). \ \ \ \ \ (17)$

These two asymptotics are not inconsistent with each other; indeed, if we shift the index of summation in (17), we can write

$\displaystyle \sum_{n=2}^\infty n \eta(n/N) = \sum_{n=1}^\infty (n+1) \eta((n+1)/N) \ \ \ \ \ (18)$

and so we now see that the discrepancy between the two sums in (8), (9) come from the shifting of the cutoff ${\eta(n/N)}$, which is invisible in the formal expressions in (8), (9) but become manifestly present in the smoothed sum formulation.

Exercise 2 By Taylor expanding ${\eta(n+1/N)}$ and using (11), (18) show that (16) and (17) are indeed consistent with each other, and in particular one can deduce the latter from the former.

We now give a basic application of Fourier analysis to the problem of counting additive patterns in sets, namely the following famous theorem of Roth:

Theorem 1 (Roth’s theorem) Let ${A}$ be a subset of the integers ${{\bf Z}}$ whose upper density

$\displaystyle \overline{\delta}(A) := \limsup_{N \rightarrow \infty} \frac{|A \cap [-N,N]|}{2N+1}$

is positive. Then ${A}$ contains infinitely many arithmetic progressions ${a, a+r, a+2r}$ of length three, with ${a \in {\bf Z}}$ and ${r>0}$.

This is the first non-trivial case of Szemerédi’s theorem, which is the same assertion but with length three arithmetic progressions replaced by progressions of length ${k}$ for any ${k}$.

As it turns out, one can prove Roth’s theorem by an application of linear Fourier analysis – by comparing the set ${A}$ (or more precisely, the indicator function ${1_A}$ of that set, or of pieces of that set) against linear characters ${n \mapsto e(\alpha n)}$ for various frequencies ${\alpha \in {\bf R}/{\bf Z}}$. There are two extreme cases to consider (which are model examples of a more general dichotomy between structure and randomness). One is when ${A}$ is aligned up almost completely with one of these linear characters, for instance by being a Bohr set of the form

$\displaystyle \{ n \in {\bf Z}: \| \alpha n - \theta \|_{{\bf R}/{\bf Z}} < \epsilon \}$

or more generally of the form

$\displaystyle \{ n \in {\bf Z}: \alpha n \in U \}$

for some multi-dimensional frequency ${\alpha \in {\bf T}^d}$ and some open set ${U}$. In this case, arithmetic progressions can be located using the equidistribution theory of the previous set of notes. At the other extreme, one has Fourier-uniform or Fourier-pseudorandom sets, whose correlation with any linear character is negligible. In this case, arithmetic progressions can be produced in abundance via a Fourier-analytic calculation.

To handle the general case, one must somehow synthesise together the argument that deals with the structured case with the argument that deals with the random case. There are several known ways to do this, but they can be basically classified into two general methods, namely the density increment argument (or ${L^\infty}$ increment argument) and the energy increment argument (or ${L^2}$ increment argument).

The idea behind the density increment argument is to introduce a dichotomy: either the object ${A}$ being studied is pseudorandom (in which case one is done), or else one can use the theory of the structured objects to locate a sub-object of significantly higher “density” than the original object. As the density cannot exceed one, one should thus be done after a finite number of iterations of this dichotomy. This argument was introduced by Roth in his original proof of the above theorem.

The idea behind the energy increment argument is instead to decompose the original object ${A}$ into two pieces (and, sometimes, a small additional error term): a structured component that captures all the structured objects that have significant correlation with ${A}$, and a pseudorandom component which has no significant correlation with any structured object. This decomposition usually proceeds by trying to maximise the “energy” (or ${L^2}$ norm) of the structured component, or dually by trying to minimise the energy of the residual between the original object and the structured object. This argument appears for instance in the proof of the Szemerédi regularity lemma (which, not coincidentally, can also be used to prove Roth’s theorem), and is also implicit in the ergodic theory approach to such problems (through the machinery of conditional expectation relative to a factor, which is a type of orthogonal projection, the existence of which is usually established via an energy increment argument). However, one can also deploy the energy increment argument in the Fourier analytic setting, to give an alternate Fourier-analytic proof of Roth’s theorem that differs in some ways from the density increment proof.

In these notes we give both two Fourier-analytic proofs of Roth’s theorem, one proceeding via the density increment argument, and the other by the energy increment argument. As it turns out, both of these arguments extend to establish Szemerédi’s theorem, and more generally in counting other types of patterns, but this is non-trivial (requiring some sort of inverse conjecture for the Gowers uniformity norms in both cases); we will discuss this further in later notes.

Semilinear dispersive and wave equations, of which the defocusing nonlinear wave equation

$\displaystyle -\partial_{tt} u + \Delta u = |u|^{p-1} u \ \ \ \ \ (1)$

is a typical example (where ${p>1}$ is a fixed exponent, and ${u: {\bf R}^{1+n} \rightarrow {\bf R}}$ is a scalar field), can be viewed as a “tug of war” between a linear dispersive equation, in this case the linear wave equation

$\displaystyle -\partial_{tt} u + \Delta u = 0 \ \ \ \ \ (2)$

and a nonlinear ODE, in this case the equation

$\displaystyle -\partial_{tt} u = |u|^{p-1} u. \ \ \ \ \ (3)$

If the nonlinear term was not present, leaving only the dispersive equation (2), then as the term “dispersive” suggests, in the asymptotic limit ${t \rightarrow \infty}$, the solution ${u(t,x)}$ would spread out in space and decay in amplitude. For instance, in the model case when ${d=3}$ and the initial position ${u(0,x)}$ vanishes (leaving only the initial velocity ${u_t(0,x)}$ as non-trivial initial data), the solution ${u(t,x)}$ for ${t>0}$ is given by the formula

$\displaystyle u(t,x) = \frac{1}{4\pi t} \int_{|y-x|=t} u_t(0,y)\ d\sigma$

where ${d\sigma}$ is surface measure on the sphere ${\{ y \in {\bf R}^3: |y-x| = t \}}$. (To avoid technical issues, let us restrict attention to classical (smooth) solutions.) Thus, if the initial velocity was bounded and compactly supported, then the solution ${u(t,x)}$ would be bounded by ${O(1/t)}$ and would thus would decay uniformly to zero as ${t \rightarrow \infty}$. Similar phenomena occur for all dimensions greater than ${1}$.

Conversely, if the dispersive term was not present, leaving only the ODE (3), then one no longer expects decay; indeed, given the conserved energy ${\frac{1}{2} u_t^2 + \frac{1}{p+1} |u|^{p+1}}$ for the ODE (3), we do not expect any decay at all (and indeed, solutions are instead periodic in time for each fixed ${x}$, as can easily be seen by viewing the ODE (and the energy curves) in phase space).

Depending on the relative “size” of the dispersive term ${\Delta u}$ and the nonlinear term ${|u|^{p-1} u}$, one can heuristically describe the behaviour of a solution ${u}$ at various positions at times as either being dispersion dominated (in which ${|\Delta u| \gg |u|^p}$), nonlinearity dominated (in which ${|u|^p \gg |\Delta u|}$), or contested (in which ${|\Delta u|}$, ${|u|^p}$ are comparable in size). Very roughly speaking, when one is in the dispersion dominated regime, then perturbation theory becomes effective, and one can often show that the solution to the nonlinear equation indeed behaves like the solution to the linear counterpart, in particular exhibiting decay as ${t \rightarrow \infty}$. In principle, perturbation theory is also available in the nonlinearity dominated regime (in which the dispersion is now viewed as the perturbation, and the nonlinearity as the main term), but in practice this is often difficult to apply (due to the nonlinearity of the approximating equation and the large number of derivatives present in the perturbative term), and so one has to fall back on non-perturbative tools, such as conservation laws and monotonicity formulae. The contested regime is the most interesting, and gives rise to intermediate types of behaviour that are not present in the purely dispersive or purely nonlinear equations, such as solitary wave solutions (solitons) or solutions that blow up in finite time.

In order to analyse how solutions behave in each of these regimes rigorously, one usually works with a variety of function spaces (such as Lebesgue spaces ${L^p}$ and Sobolev spaces ${H^s}$). As such, one generally needs to first establish a number of function space estimates (e.g. Sobolev inequalities, Hölder-type inequalities, Strichartz estimates, etc.) in order to study these equations at the formal level.

Unfortunately, this emphasis on function spaces and their estimates can obscure the underlying physical intuition behind the dynamics of these equations, and the field of analysis of PDE sometimes acquires a reputation for being unduly technical as a consequence. However, as noted in a previous blog post, one can view function space norms as a way to formalise the intuitive notions of the “height” (amplitude) and “width” (wavelength) of a function (wave).

It turns out that one can similarly analyse the behaviour of nonlinear dispersive equations on a similar heuristic level, as that of understanding the dynamics as the amplitude ${A(t)}$ and wavelength ${1/N(t)}$ (or frequency ${N(t)}$) of a wave. Below the fold I give some examples of this heuristic; for sake of concreteness I restrict attention to the nonlinear wave equation (1), though one can of course extend this heuristic to many other models also. Rigorous analogues of the arguments here can be found in several places, such as the book of Shatah and Struwe, or my own book on the subject.