You are currently browsing the monthly archive for April 2010.
A friend of mine recently asked me for some suggestions for games or other activities for children that would help promote quantitative reasoning or mathematical skills, while remaining fun to play (i.e. more than just homework-type questions poorly disguised in game form). The initial question was focused on computer games (and specifically, on iPhone apps), but I think the broader question would also be of interest.
I myself have not seriously played these sorts of games for years, so I could only come up with a few examples immediately: the game “Planarity“, and the game “Factory Balls” (and two sequels). (Edit: Rubik’s cube and its countless cousins presumably qualify also, due to their implicit use of group theory.) I am hopeful though that readers may be able to come up with more suggestions.
There is of course no shortage of “educational” games, computer-based or otherwise, available, but I think what I (and my friend) would be looking for here are games with production values comparable to other, less educational games, and for which the need for mathematical thinking arises naturally in the gameplay rather than being artificially inserted by fiat (e.g. “solve this equation to proceed”). (Here I interpret “mathematical thinking” loosely, to include not just numerical or algebraic thinking, but also geometric, abstract, logical, probabilistic, etc.)
[Question for MathOverflow experts: would this type of question be suitable for crossposting there? The requirement that such questions be “research-level” seems to suggest not.]
In the previous lecture notes, we used (linear) Fourier analysis to control the number of three-term arithmetic progressions in a given set
. The power of the Fourier transform for this problem ultimately stemmed from the identity
for any cyclic group and any subset
of that group (analogues of this identity also exist for other finite abelian groups, and to a lesser extent to non-abelian groups also, although that is not the focus of my current discussion). As it turns out, linear Fourier analysis is not able to discern higher order patterns, such as arithmetic progressions of length four; we give some demonstrations of this below the fold, taking advantage of the polynomial recurrence theory from Notes 1.
The main objective of this course is to introduce the (still nascent) theory of higher order Fourier analysis, which is capable of studying higher order patterns. The full theory is still rather complicated (at least, at our present level of understanding). However, one aspect of the theory is relatively simple, namely that we can largely reduce the study of arbitrary additive patterns to the study of a single type of additive pattern, namely the parallelopipeds
Thus for instance, for one has the line segments
for one has the parallelograms
for one has the parallelopipeds
These patterns are particularly pleasant to handle, thanks to the large number of symmetries available on the discrete cube . For instance, whereas establishing the presence of arbitrarily long arithmetic progressions in dense sets is quite difficult (Szemerédi’s theorem), establishing arbitrarily high-dimensional parallelopipeds is much easier:
Exercise 1 Let
be such that
for some
. If
is sufficiently large depending on
, show that there exists an integer
such that
. (Hint: obtain upper and lower bounds on the set
.)
Exercise 2 (Hilbert cube lemma) Let
be such that
for some
, and let
be an integer. Show that if
is sufficiently large depending on
, then
contains a parallelopiped of the form (2), with
positive integers. (Hint: use the previous exercise and induction.) Conclude that if
has positive upper density, then it contains infinitely many such parallelopipeds for each
.
Exercise 3 Show that if
is an integer, and
is sufficiently large depending on
, then for any parallelopiped (2) in the integers
, there exists
, not all zero, such that
. (Hint: pigeonhole the
in the residue classes modulo
.) Use this to conclude that if
is the set of all integers
such that
for all integers
, then
is a set of positive upper density (and also positive lower density) which does not contain any infinite parallelopipeds (thus one cannot take
in the Hilbert cube lemma).
The standard way to control the parallelogram patterns (and thus, all other (finite complexity) linear patterns) are the Gowers uniformity norms
with a function on a finite abelian group
, and
is the complex conjugation operator; analogues of this norm also exist for group-like objects such as the progression
, and also for measure-preserving systems (where they are known as the Gowers-Host-Kra uniformity seminorms, see this paper of Host-Kra for more discussion). In this set of notes we will focus on the basic properties of these norms; the deepest fact about them, known as the inverse conjecture for these norms, will be discussed in later notes.
Ordinarily, I only mention my research papers on this blog when they are first submitted, or if a major update is required. With the paper arising from the DHJ Polymath “Low Dimensions” project, though, the situation is a little different as the collaboration to produce the paper took place on this blog.
Anyway, the good news is that the paper has been accepted for the Szemerédi birthday conference proceedings. The referee was positive and recommended only some minor changes (I include the list of changes below the fold). I have incorporated these changes, and the new version of the paper can be found here. Within a few days I need to return the paper to the editor, so this is the last chance to propose any further corrections or changes (though at this stage any major changes are unlikely to be feasible).
The editor asked a good question: should we have a list of participants for this project somewhere? If so, it seems to make more sense to have this list as a link on the wiki, rather than in the paper itself. But making a list opens the can of worms of deciding what level of participation should be the threshold for inclusion in the list – should someone who only contributed, say, one tangential comment to one of the blog posts be listed alongside a substantially more active participant?
One possibility is that of self-reporting; we could set up a page for participants on the wiki and let anyone who felt like they contributed add their name, and rely on the honour code to keep it accurate. This might be feasible as long as the page is kept unofficial (so, in particular, it will not be used as the formal list of authors for the paper).
A related question is whether to add an explicit link to the timeline for progress on this project and on the sister “New proof” project. If so, this should also be kept unofficial (there was no formal guidelines as to what was included in the timeline and what was not).
These decisions do not necessarily have to be taken quickly; one can simply point to the wiki in the paper (as is already done in the current version), and update the wiki later if we decide to add these sorts of acknowledgments on that site.
Incidentally, if we have another successful Polymath project to write up, I would now strongly recommend using version control software (such as Subversion or git) to organise the writing process, both at the informal notes stage and also at the drafting stage. It is certainly far superior to our improvised solution of putting the raw TeX files on a wiki…
It’s been a while since I’ve added to my career advice and writing pages on this blog, but I recently took the time to write up another such page on a topic I had not previously covered, entitled “Write in your own voice“. The main point here is that while every piece of mathematical research inevitably builds upon the previous literature, one should not mimic the style and text of that literature slavishly, but instead develop one’s own individual style, while also updating and adapting the results and insights of previous authors.
In topology, a non-empty set is said to be connected if cannot be decomposed into two nontrivial subsets that are both closed and open relative to
, and path connected if any two points
in
can be connected by a path (i.e. there exists a continuous map
with
and
).
Path-connected sets are always connected, but the converse is not true, even in the model case of compact subsets of a Euclidean space. The classic counterexample is the set
which is connected but not path-connected (there is no continuous path from to
).
Looking at the definitions of the two concepts, one notices a difference: the notion of path-connectedness is somehow a “positive” one, in the sense that a path-connected set can produce the existence of something (a path connecting two points and
) for a given type of input (in this case, a pair of points
). On the other hand, the notion of connectedness is a “negative” one, in that it asserts the non-existence of something (a non-trivial partition into clopen sets). To put it another way, it is relative easy to convince someone that a set is path-connected (by providing a connecting path for every pair of points) or is disconnected (by providing a non-trivial partition into clopen sets) but if a set is not path-connected, or is connected, how can one easily convince someone of this fact? To put it yet another way: is there a reasonable certificate for connectedness (or for path-disconnectedness)?
In the case of connectedness for compact subsets of Euclidean space, there is an answer as follows. If
, let us call two points
in
-connected if one can find a finite sequence
of points in
, such that
for all
; informally, one can jump from
to
in
using jumps of length at most
. Let us call
an
-discrete path.
Proposition 1 (Connectedness certificate for compact subsets of Euclidean space) Let
be compact and non-empty. Then
is connected if and only if every pair of points in
is
-connected for every
.
Proof: Suppose first that is disconnected, then
can be partitioned into two non-empty closed subsets
. Since
is compact,
are compact also, and so they are separated by some non-zero distance
. But then it is clear that points in
cannot be
-connected to points in
, and the claim follows.
Conversely, suppose that there is a pair of points in
and an
such that
are not
-connected. Let
be the set of all points in
that are
-connected to
. It is easy to check that
is open, closed, and a proper subset of
; thus
is disconnected.
We remark that the above proposition in fact works for any compact metric space. It is instructive to see how the points and
are
-connected in the set (1); the
-discrete path follows the graph of
backwards until one gets sufficiently close to the
-axis, at which point one “jumps” across to the
-axis to eventually reach
.
It is also interesting to contrast the above proposition with path connectedness. Clearly, if two points are connected by a path, then they are
-connected for every
(because every continuous map
is uniformly continuous); but from the analysis of the example (1) we see that the converse is not true. Roughly speaking, the various
-discrete paths from
to
have to be “compatible” with each other in some sense in order to synthesise a continuous path from
to
in the limit (we will not make this precise here).
But this leaves two (somewhat imprecise) questions, which I do not know how to satisfactorily answer:
Question 1: Is there a good certificate for path disconnectedness, say for compact subsets of
? One can construct lousy certificates, for instance one could look at all continuous paths in
joining two particular points
in
, and verify that each one of them leaves
at some point. But this is an “uncountable” certificate – it requires one to check an uncountable number of paths. In contrast, the certificate in Proposition 1 is basically a countable one (if one describes a compact set
by describing a family of
-nets for a countable sequence of
tending to zero). (Very roughly speaking, I would like a certificate that can somehow be “verified in countable time” in a suitable oracle model, as discussed in my previous post, though I have not tried to make this vague specification more rigorous.)
It is tempting to look at the equivalence classes of given by the relation of being connected by a path, but these classes need not be closed (as one can see with the example (1)) and it is not obvious to me how to certify that two such classes are not path-connected to each other.
Question 2: Is there a good certificate for connectedness for closed but unbounded closed subsets of ? Proposition 1 fails in this case; consider for instance the set
Any pair of points is
-connected for every
, and yet this set is disconnected.
The problem here is that as gets smaller, the
-discrete paths connecting a pair of points such as
and
have diameter going to infinity. One natural guess is then to require a uniform bound on the diameter, i.e. that for any pair of points
, there exists an
such that there is an
-discrete path from
to
of diameter at most
for every
. This does indeed force connectedness, but unfortunately not all connected sets have this property. Consider for instance the set
in , where
is a rectangular ellipse centered at the origin with minor diameter endpoints and major diameter endpoints
, and
is a circle that connects the endpoint of
to the point
in
. One can check that
is a closed connected set, but the
-discrete paths connecting
with
have unbounded diameter as
.
Currently, I do not have any real progress on Question 1. For Question 2, I can only obtain the following strange “second-order” criterion for connectedness, that involves an unspecified gauge function :
Proposition 2 (Second-order connectedness certificate) Let
be a closed non-empty subset of
. Then the following are equivalent:
is connected.
- For every monotone decreasing, strictly positive function
and every
, there exists a discrete path
in
such that
.
Proof: This is proven in almost the same way as Proposition 1. If can be disconnected into two non-trivial sets
, then one can find a monotone decreasing gauge function
such that for each ball
,
and
are separated by at least
, and then there is no discrete path from
to
in
obeying the condition
.
Conversely, if there exists a gauge function and two points
which cannot be connected by a discrete path in
that obeys the condition
, then if one sets
to be all the points that can be reached from
in this manner, one easily verifies that
and
disconnect
.
It may be that this is somehow the “best” one can do, but I am not sure how to quantify this formally.
Anyway, I was curious if any of the readers here (particularly those with expertise in point-set topology or descriptive set theory) might be able to shed more light on these questions. (I also considered crossposting this to Math Overflow, but I think the question may be a bit too long (and vague) for that.)
(The original motivation for this question, by the way, stems from an attempt to use methods of topological group theory to attack questions in additive combinatorics, in the spirit of the paper of Hrushovski studied previously on this blog. The connection is rather indirect, though; I may discuss this more in a future post.)
The Riemann zeta function is defined in the region
by the absolutely convergent series
Thus, for instance, it is known that , and thus
For , the series on the right-hand side of (1) is no longer absolutely convergent, or even conditionally convergent. Nevertheless, the
function can be extended to this region (with a pole at
) by analytic continuation. For instance, it can be shown that after analytic continuation, one has
,
, and
, and more generally
for , where
are the Bernoulli numbers. If one formally applies (1) at these values of
, one obtains the somewhat bizarre formulae
Clearly, these formulae do not make sense if one stays within the traditional way to evaluate infinite series, and so it seems that one is forced to use the somewhat unintuitive analytic continuation interpretation of such sums to make these formulae rigorous. But as it stands, the formulae look “wrong” for several reasons. Most obviously, the summands on the left are all positive, but the right-hand sides can be zero or negative. A little more subtly, the identities do not appear to be consistent with each other. For instance, if one adds (4) to (5), one obtains
whereas if one subtracts from (5) one obtains instead
and the two equations seem inconsistent with each other.
However, it is possible to interpret (4), (5), (6) by purely real-variable methods, without recourse to complex analysis methods such as analytic continuation, thus giving an “elementary” interpretation of these sums that only requires undergraduate calculus; we will later also explain how this interpretation deals with the apparent inconsistencies pointed out above.
To see this, let us first consider a convergent sum such as (2). The classical interpretation of this formula is the assertion that the partial sums
converge to as
, or in other words that
where denotes a quantity that goes to zero as
. Actually, by using the integral test estimate
we have the sharper result
Thus we can view as the leading coefficient of the asymptotic expansion of the partial sums of
.
One can then try to inspect the partial sums of the expressions in (4), (5), (6), but the coefficients bear no obvious relationship to the right-hand sides:
For (7), the classical Faulhaber formula (or Bernoulli formula) gives
for , which has a vague resemblance to (7), but again the connection is not particularly clear.
The problem here is the discrete nature of the partial sum
which (if is viewed as a real number) has jump discontinuities at each positive integer value of
. These discontinuities yield various artefacts when trying to approximate this sum by a polynomial in
. (These artefacts also occur in (2), but happen in that case to be obscured in the error term
; but for the divergent sums (4), (5), (6), (7), they are large enough to cause real trouble.)
However, these issues can be resolved by replacing the abruptly truncated partial sums with smoothed sums
, where
is a cutoff function, or more precisely a compactly supported bounded function that is right-continuous the origin and equals
at
. The case when
is the indicator function
then corresponds to the traditional partial sums, with all the attendant discretisation artefacts; but if one chooses a smoother cutoff, then these artefacts begin to disappear (or at least become lower order), and the true asymptotic expansion becomes more manifest.
Note that smoothing does not affect the asymptotic value of sums that were already absolutely convergent, thanks to the dominated convergence theorem. For instance, we have
whenever is a cutoff function (since
pointwise as
and is uniformly bounded). If
is equal to
on a neighbourhood of the origin, then the integral test argument then recovers the
decay rate:
However, smoothing can greatly improve the convergence properties of a divergent sum. The simplest example is Grandi’s series
The partial sums
oscillate between and
, and so this series is not conditionally convergent (and certainly not absolutely convergent). However, if one performs analytic continuation on the series
and sets , one obtains a formal value of
for this series. This value can also be obtained by smooth summation. Indeed, for any cutoff function
, we can regroup
If is twice continuously differentiable (i.e.
), then from Taylor expansion we see that the summand has size
, and also (from the compact support of
) is only non-zero when
. This leads to the asymptotic
and so we recover the value of as the leading term of the asymptotic expansion.
Exercise 1 Show that if
is merely once continuously differentiable (i.e.
), then we have a similar asymptotic, but with an error term of
instead of
. This is an instance of a more general principle that smoother cutoffs lead to better error terms, though the improvement sometimes stops after some degree of regularity.
Remark 2 The most famous instance of smoothed summation is Cesáro summation, which corresponds to the cutoff function
. Unsurprisingly, when Cesáro summation is applied to Grandi’s series, one again recovers the value of
.
If we now revisit the divergent series (4), (5), (6), (7) with smooth summation in mind, we finally begin to see the origin of the right-hand sides. Indeed, for any fixed smooth cutoff function , we will shortly show that
for any fixed where
is the Archimedean factor
(which is also essentially the Mellin transform of ). Thus we see that the values (4), (5), (6), (7) obtained by analytic continuation are nothing more than the constant terms of the asymptotic expansion of the smoothed partial sums. This is not a coincidence; we will explain the equivalence of these two interpretations of such sums (in the model case when the analytic continuation has only finitely many poles and does not grow too fast at infinity) below the fold.
This interpretation clears up the apparent inconsistencies alluded to earlier. For instance, the sum consists only of non-negative terms, as does its smoothed partial sums
(if
is non-negative). Comparing this with (12), we see that this forces the highest-order term
to be non-negative (as indeed it is), but does not prohibit the lower-order constant term
from being negative (which of course it is).
Similarly, if we add together (12) and (11) we obtain
while if we subtract from (12) we obtain
These two asymptotics are not inconsistent with each other; indeed, if we shift the index of summation in (17), we can write
and so we now see that the discrepancy between the two sums in (8), (9) come from the shifting of the cutoff , which is invisible in the formal expressions in (8), (9) but become manifestly present in the smoothed sum formulation.
Exercise 3 By Taylor expanding
and using (11), (18) show that (16) and (17) are indeed consistent with each other, and in particular one can deduce the latter from the former.
We now give a basic application of Fourier analysis to the problem of counting additive patterns in sets, namely the following famous theorem of Roth:
Theorem 1 (Roth’s theorem) Let
be a subset of the integers
whose upper density
is positive. Then
contains infinitely many arithmetic progressions
of length three, with
and
.
This is the first non-trivial case of Szemerédi’s theorem, which is the same assertion but with length three arithmetic progressions replaced by progressions of length for any
.
As it turns out, one can prove Roth’s theorem by an application of linear Fourier analysis – by comparing the set (or more precisely, the indicator function
of that set, or of pieces of that set) against linear characters
for various frequencies
. There are two extreme cases to consider (which are model examples of a more general dichotomy between structure and randomness). One is when
is aligned up almost completely with one of these linear characters, for instance by being a Bohr set of the form
or more generally of the form
for some multi-dimensional frequency and some open set
. In this case, arithmetic progressions can be located using the equidistribution theory of the previous set of notes. At the other extreme, one has Fourier-uniform or Fourier-pseudorandom sets, whose correlation with any linear character is negligible. In this case, arithmetic progressions can be produced in abundance via a Fourier-analytic calculation.
To handle the general case, one must somehow synthesise together the argument that deals with the structured case with the argument that deals with the random case. There are several known ways to do this, but they can be basically classified into two general methods, namely the density increment argument (or increment argument) and the energy increment argument (or
increment argument).
The idea behind the density increment argument is to introduce a dichotomy: either the object being studied is pseudorandom (in which case one is done), or else one can use the theory of the structured objects to locate a sub-object of significantly higher “density” than the original object. As the density cannot exceed one, one should thus be done after a finite number of iterations of this dichotomy. This argument was introduced by Roth in his original proof of the above theorem.
The idea behind the energy increment argument is instead to decompose the original object into two pieces (and, sometimes, a small additional error term): a structured component that captures all the structured objects that have significant correlation with
, and a pseudorandom component which has no significant correlation with any structured object. This decomposition usually proceeds by trying to maximise the “energy” (or
norm) of the structured component, or dually by trying to minimise the energy of the residual between the original object and the structured object. This argument appears for instance in the proof of the Szemerédi regularity lemma (which, not coincidentally, can also be used to prove Roth’s theorem), and is also implicit in the ergodic theory approach to such problems (through the machinery of conditional expectation relative to a factor, which is a type of orthogonal projection, the existence of which is usually established via an energy increment argument). However, one can also deploy the energy increment argument in the Fourier analytic setting, to give an alternate Fourier-analytic proof of Roth’s theorem that differs in some ways from the density increment proof.
In these notes we give both two Fourier-analytic proofs of Roth’s theorem, one proceeding via the density increment argument, and the other by the energy increment argument. As it turns out, both of these arguments extend to establish Szemerédi’s theorem, and more generally in counting other types of patterns, but this is non-trivial (requiring some sort of inverse conjecture for the Gowers uniformity norms in both cases); we will discuss this further in later notes.
Read the rest of this entry »
Semilinear dispersive and wave equations, of which the defocusing nonlinear wave equation
is a typical example (where is a fixed exponent, and
is a scalar field), can be viewed as a “tug of war” between a linear dispersive equation, in this case the linear wave equation
and a nonlinear ODE, in this case the equation
If the nonlinear term was not present, leaving only the dispersive equation (2), then as the term “dispersive” suggests, in the asymptotic limit , the solution
would spread out in space and decay in amplitude. For instance, in the model case when
and the initial position
vanishes (leaving only the initial velocity
as non-trivial initial data), the solution
for
is given by the formula
where is surface measure on the sphere
. (To avoid technical issues, let us restrict attention to classical (smooth) solutions.) Thus, if the initial velocity was bounded and compactly supported, then the solution
would be bounded by
and would thus would decay uniformly to zero as
. Similar phenomena occur for all dimensions greater than
.
Conversely, if the dispersive term was not present, leaving only the ODE (3), then one no longer expects decay; indeed, given the conserved energy for the ODE (3), we do not expect any decay at all (and indeed, solutions are instead periodic in time for each fixed
, as can easily be seen by viewing the ODE (and the energy curves) in phase space).
Depending on the relative “size” of the dispersive term and the nonlinear term
, one can heuristically describe the behaviour of a solution
at various positions at times as either being dispersion dominated (in which
), nonlinearity dominated (in which
), or contested (in which
,
are comparable in size). Very roughly speaking, when one is in the dispersion dominated regime, then perturbation theory becomes effective, and one can often show that the solution to the nonlinear equation indeed behaves like the solution to the linear counterpart, in particular exhibiting decay as
. In principle, perturbation theory is also available in the nonlinearity dominated regime (in which the dispersion is now viewed as the perturbation, and the nonlinearity as the main term), but in practice this is often difficult to apply (due to the nonlinearity of the approximating equation and the large number of derivatives present in the perturbative term), and so one has to fall back on non-perturbative tools, such as conservation laws and monotonicity formulae. The contested regime is the most interesting, and gives rise to intermediate types of behaviour that are not present in the purely dispersive or purely nonlinear equations, such as solitary wave solutions (solitons) or solutions that blow up in finite time.
In order to analyse how solutions behave in each of these regimes rigorously, one usually works with a variety of function spaces (such as Lebesgue spaces and Sobolev spaces
). As such, one generally needs to first establish a number of function space estimates (e.g. Sobolev inequalities, Hölder-type inequalities, Strichartz estimates, etc.) in order to study these equations at the formal level.
Unfortunately, this emphasis on function spaces and their estimates can obscure the underlying physical intuition behind the dynamics of these equations, and the field of analysis of PDE sometimes acquires a reputation for being unduly technical as a consequence. However, as noted in a previous blog post, one can view function space norms as a way to formalise the intuitive notions of the “height” (amplitude) and “width” (wavelength) of a function (wave).
It turns out that one can similarly analyse the behaviour of nonlinear dispersive equations on a similar heuristic level, as that of understanding the dynamics as the amplitude and wavelength
(or frequency
) of a wave. Below the fold I give some examples of this heuristic; for sake of concreteness I restrict attention to the nonlinear wave equation (1), though one can of course extend this heuristic to many other models also. Rigorous analogues of the arguments here can be found in several places, such as the book of Shatah and Struwe, or my own book on the subject.
Recent Comments