This post is in some ways an antithesis of my previous postings on hard and soft analysis. In those posts, the emphasis was on taking a result in soft analysis and converting it into a hard analysis statement (making it more “quantitative” or “effective”); here we shall be focusing on the reverse procedure, in which one harnesses the power of infinitary mathematics – in particular, ultrafilters and nonstandard analysis – to facilitate the proof of finitary statements.

Arguments in hard analysis are notorious for their profusion of “epsilons and deltas”. In the more sophisticated arguments of this type, one can end up having an entire army of epsilons that one needs to manage, in particular choosing each epsilon carefully to be sufficiently small compared to other parameters (including other epsilons), while of course avoiding an impossibly circular situation in which a parameter is ultimately required to be small with respect to itself, which is absurd. This art of *epsilon management*, once mastered, is not terribly difficult – it basically requires one to mentally keep track of which quantities are “small”, “very small”, “very very small”, and so forth – but when these arguments get particularly lengthy, then epsilon management can get rather tedious, and also has the effect of making these arguments unpleasant to read. In particular, any given assertion in hard analysis usually comes with a number of unsightly quantifiers (For every there exists an N…) which can require some thought for a reader to parse. This is in contrast with soft analysis, in which most of the quantifiers (and the epsilons) can be cleanly concealed via the deployment of some very useful terminology; consider for instance how many quantifiers and epsilons are hidden within, say, the Heine-Borel theorem: “a subset of a Euclidean space is compact if and only if it is closed and bounded”.

For those who practice hard analysis for a living (such as myself), it is natural to wonder if one can somehow “clean up” or “automate” all the epsilon management which one is required to do, and attain levels of elegance and conceptual clarity comparable to those in soft analysis, hopefully without sacrificing too much of the “elementary” or “finitary” nature of hard analysis in the process.

One important step in this direction has been the development of various types of *asymptotic notation*, such as the Hardy notation of using unspecified constants C, the Landau notation of using O() and o(), or the Vinogradov notation of using symbols such as or ; each of these symbols, when properly used, absorbs one or more of the ambient quantifiers in a hard analysis statement, thus making these statements easier to read. But, as useful as these notations are, they still fall a little short of fully capturing one’s intuition regarding orders of magnitude. For instance, we tend to think of any quantity of the form O(1) as being “bounded”, and we know that bounded objects can be combined to form more bounded objects; for instance, if x=O(1) and y=O(1), then x+y=O(1) and xy=O(1). But if we attempt to formalise this by trying to create the set of all bounded numbers, and asserting that this set is then closed under addition and multiplication, we are speaking nonsense; the O() notation cannot be used within the axiom schema of specification, and so the above definition of A is meaningless.

There is however, a way to make concepts such as “the set of all bounded numbers” precise and meaningful, by using non-standard analysis, which is the most well-known of the “pseudo-finitary” approaches to analysis, in which one adjoins additional numbers to the standard number system. Similarly for “bounded” replaced by “small”, “polynomial size”, etc.. Now, in order to set up non-standard analysis one needs a (non-principal) ultrafilter (or an equivalent gadget), which tends to deter people from wanting to hear more about the subject. Because of this, most treatments of non-standard analysis tend to gloss over the actual construction of non-standard number systems, and instead emphasise the various benefits that these systems offer, such as a rigorous supply of infinitesimals, and a general transfer principle that allows one to convert statements in standard analysis into equivalent ones in non-standard analysis. This transfer principle (which requires the ultrafilter to prove) is usually recommended to be applied only at the very beginning and at the very end of an argument, so that the bulk of the argument is carried out purely in the non-standard universe.

I feel that one of the reasons that non-standard analysis is not embraced more widely is because the transfer principle, and the ultrafilter that powers it, is often regarded as some sort of “black box” which mysteriously bestows some certificate of rigour on non-standard arguments used to prove standard theorems, while conveying no information whatsoever on what the quantitative bounds for such theorems should be. Without a proper understanding of this black box, a mathematician may then feel uncomfortable with any non-standard argument, no matter how impressive and powerful the result.

The purpose of this post is to try to explain this black box from a “hard analysis” perspective, so that one can comfortably and productively transfer into the non-standard universe whenever it becomes convenient to do so (in particular, it can become cost-effective to do this whenever the burden of epsilon management becomes excessive, and one is willing to not make certain implied constants explicit).

— What is an ultrafilter? —

In order to do all this, we have to tackle head-on the notorious concept of a non-principal ultrafilter. Actually, these ultrafilters are not as impossible to understand as their reputation suggests; they are basically a consistent set of rules which allow one to always take limits (or make similar decisions) whenever necessary.

To motivate them, let us recall some of the properties of convergent sequences from undergraduate real analysis. If is a convergent sequence of real numbers (where n ranges in the natural numbers), then we have a limit , which is also a real number. In addition to the usual analytical interpretations, we can also interpret the concept of a limit as a voting system, in which the natural numbers n are the voters, which are each voting for a real number , and the limit is the elected “winner” emerging from all of these votes. One can also view the limit (somewhat non-rigorously) as the expected value of when n is a “randomly chosen” natural number. Ignoring for now the issue that the natural numbers do not admit a uniform probability measure, it is intuitively clear that such a “randomly chosen” number is almost surely going to be larger than any fixed finite number, and so almost surely will be arbitrarily close to the limit (thus we have a sort of “concentration of measure“).

These limits obey a number of laws, including

- (Algebra homomorphism) If are convergent sequences, and c is a real number, then , , , and . (In particular, all sequences on the left-hand side are convergent.)
- (Boundedness) If is a convergent sequence, then . (In particular, if is non-negative, then so is .)
- (Non-principality) If and are convergent sequences which differ at only finitely many values of n, then . [Thus, no individual voter has any influence on the outcome of the election!]
- (Shift invariance) If is a convergent sequence, then for any natural number h we have .

These properties are of course very useful in computing the limits of various convergent sequences. It is natural to wonder if it is possible to generalise the notion of a limit to cover various non-convergent sequences, such as the class of *bounded* sequences. There are of course many ways to do this in the literature (e.g. if one considers series instead of sequences, one has Cesàro summation, zeta function regularisation, etc.), but (as observed by Euler) one has to give up at least one of the above four limit laws if one wants to evaluate the limit of sequences such as . Indeed, if this sequence had a limit x, then the algebra homomorphism laws force and thus x is either 0 or 1; on the other hand, the algebra homomorphism laws also show us that has a limit 1-x, and hence by shift invariance we have x = 1-x, which is inconsistent with the previous discussion. In the voting theory interpretation, the problem here is one of lack of consensus: half of the voters want 0 and the other half want 1, and how can one consistently and fairly elect a choice from this? Similarly, in the probabilistic interpretation, there is no concentration of measure; a randomly chosen is not close to its expected value of 1/2, but instead fluctuates randomly between 0 and 1.

So, to define more general limits, we have to give up something. We shall give up shift-invariance (property 4). In the voting theory interpretation given earlier, this means that we abandon the pretense that the election is going to be “fair”; some voters (or groups of voters) are going to be treated differently than others, due to some arbitrary choices made in designing the voting system. (This is the first hint that the axiom of choice will be involved.) Similarly, in the probabilistic interpretation, we will give up the notion that the “random number” n we will choose has a shift-invariant distribution, thus for instance n could have a different distribution than n+1.

Suppose for the moment that we managed to have an improved concept of a limit which assigned a number, let’s call it , to any bounded sequence, which obeyed the properties 1-3. It is then easy to see that this p-limit extends the ordinary notion of a limit, because if a sequence is convergent, then after modifying the sequence on finitely many elements we can keep the sequence within of for any specified , which implies (by properties 2, 3) that stays within of , and the claim follows.

Now suppose we consider a *Boolean sequence* – one which takes only the values 0 and 1. Since for all n, we see from property that , thus must also be either 0 or 1. From a voting perspective, the p-limit is a *voting system*: a mechanism for extracting a yes-no answer out of the yes-no preferences of an infinite number of voters.

Let p denote the collection of all subsets A of the natural numbers such that the indicator sequence of A (i.e. the boolean sequence which equals 1 when n lies in A and equals 0 otherwise) has a p-limit of 1; in the voting theory language, p is the collection of all voting blocs who can decide the outcome of an election by voting in unison, while in the probability theory language, p is the collection of all sets of natural numbers of measure 1. It is easy to verify that p has four properties:

- (Monotonicity) If A lies in p, and B contains A, then B lies in p.
- (Closed under intersection) If A and B lie in p, then also lies in p.
- (Dichotomy) If A is any set of natural numbers, either A or its complement lies in p, but not both.
- (Non-principality) If one adds (or deletes) a finite number of elements to (or from) a set A, this does not affect whether the set A lies in p.

A collection p obeying properties 1, 2 is called a filter; a collection obeying 1,2,3 is called an ultrafilter, and a collection obeying 1,2,3,4 is a *non-principal ultrafilter*. [In contrast, a principal ultrafilter is one which is controlled by a single index in the sense that . In voting theory language, this is a scenario in which is a dictator; in probability language, the random variable n is now a deterministic variable taking the values of .]

A property A(n) pertaining to a natural number n can be said to be *p-true* if the set lies in p, and *p-false* otherwise; for instance any tautologically true statement is also p-true. Using the probabilistic interpretation, these notions are analogous to those of “almost surely true” and “almost surely false” in probability theory. (Indeed, one can view p as being a probability measure on the natural numbers which always obeys a zero-one law, though one should caution that this measure is only finitely additive rather than countably additive, and so one should take some care in applying measure-theoretic technology directly to an ultrafilter.)

Properties 1-3 assert that this notion of “p-truth” obeys the usual laws of propositional logic, e.g. property 2 asserts that if A is p-true and B is p-true, then so is “A and B”, while property 3 is the familiar law of the excluded middle and property 1 is modus ponens. This is actually rather remarkable: it asserts that ultrafilter voting systems cannot create voting paradoxes, such as those guaranteed by Arrow’s theorem. There is no contradiction here, because Arrow’s theorem only applies to *finite* (hence *compact*) electorates of voters, which do not support any non-principal ultrafilters. At any rate, we now get a hint of why ultrafilters are such a useful concept in logic and model theory.

We have seen how the notion of a p-limit creates a non-principal ultrafilter p. Conversely, once one has a non-principal ultrafilter p, one can uniquely recover the p-limit operation. This is easiest to explain using the voting theory perspective. With the ultrafilter p, one can ask yes-no questions of an electorate, by getting each voter to answer yes or no and then seeing whether the resulting set of “yes” voters lies in p. To take a p-limit of a bounded sequence , say in [0,1], what is going on is that each voter n has his or her own favourite candidate number between 0 and 1, and one has to elect a real number x from all these preferences. One can do this by an infinite electoral version of “Twenty Questions“: one asks all the voters whether x should be greater than 1/2 or not, and uses p to determine what the answer should be; then, if x is to be greater than 1/2, one asks whether x should be greater than 3/4, and so on and so forth. This eventually determines x uniquely; the properties 1-4 of the ultrafilter can be used to derive properties 1-3 of the p-limit.

A modification of the above argument also lets us take p-limits of any sequence in a compact metric space (or slightly more generally, in any compact Hausdorff first-countable topological space). These p-limits then behave in the expected manner with respect to operations in those categories, such as composition with continuous functions or with direct sum. As for unbounded real-valued sequences, one can still extract a p-limit as long as one works in a suitable compactification of the reals, such as the extended real line.

The reconstruction of p-limits from the ultrafilter p is also analogous to how, in probability theory, the concept of expected value of a (say) non-negative random variable X can be reconstructed from the concept of probability via the integration formula . Indeed, one can define to be the supremum of all numbers x such that the assertion is p-true, or the infimum of all numbers y that is p-true.

We have said all these wonderful things about non-principal ultrafilters, but we haven’t actually shown that these amazing objects actually exist. There is a good reason for this – the existence of non-principal ultrafilters requires the axiom of choice (or some slightly weaker versions of this axiom, such as the boolean prime ideal theorem). Let’s give two quick proofs of the existence of a non-principal ultrafilter:

**Proof 1**. Let q be the set of all cofinite subsets of the natural numbers (i.e. sets whose complement is finite). This is clearly a filter which is *proper* (i.e. it does not contain the empty set ). Since the union of any chain of proper filters is again a proper filter, we see from Zorn’s lemma that q is contained in a maximal proper filter p. It is not hard to see that p must then be a non-principal ultrafilter.

**Proof 2.** Consider the Stone–Čech compactification of the natural numbers. Since is not already compact, there exists an element p of this compactification which does not lie in . Now note that any bounded sequence on the natural numbers is a bounded continuous function on (since is discrete) and thus, by definition of , extends uniquely to a bounded continuous function on , in particular one can evaluate this function at p to obtain a real number . If one then defines one easily verifies the properties 1-4 of a p-limit, which by the above discussion creates a non-principal ultrafilter (which by abuse of notation is also referred to as p; indeed, is canonically identifiable with the space of all ultrafilters).

These proofs are short, but not particularly illuminating. A more informal, but perhaps more instructive, explanation of why non-principal ultrafilters exist can be given as follows. In the voting theory language, our task is to design a complete and consistent voting system for an infinite number of voters. In the cases where there is near-consensus, in the sense that all but finitely many of the voters vote one way or another, the decision is clear – go with the option which is preferred by the infinite voting bloc. But what if an issue splits the electorate with an infinite number of voters on each side? Then what one has to do is make an arbitrary choice – pick one side to go with and completely disenfranchise all the voters on the other side, so that they will have no further say in any subsequent votes. By performing this disenfranchisement, we increase the total number of issues for which our electoral system can reach a consistent decision; basically, any issue which has the consensus of all but finitely many of those voters not yet disenfranchised can now be decided upon in a consistent (though highly unfair) manner. We now continue voting until we reach another issue which splits the remaining pool of voters into two infinite groups, at which point we have to make another arbitrary choice, and disenfranchise another infinite set of voters. Very roughly speaking, if one continues this process of making arbitrary choices “ad infinitum”, then at the end of this transfinite process we eventually exhaust the (uncountable) number of issues one has to decide, and one ends up with the non-principal ultrafilter. (If at any stage of the process one decided to disenfranchise all but finitely many of the voters, then one would quickly end up with a principal ultrafilter, i.e. a dictatorship.) One should take this informal argument with a grain of salt; it turns out that after one has made an infinite number of choices, the infinite number of disenfranchised groups, while individually having no further power to influence elections, can begin having some *collective* power, basically because property 2 of a filter only guarantees closure under finite intersections and not infinite intersections, and things begin to get rather complicated. At this point, I recommend abandoning the informal picture and returning to Zorn’s lemma :-) .

With this informal discussion, it is now rather clear why the axiom of choice (or something very much like that axiom) needs to play a role in constructing non-principal ultrafilters. However, one may wonder whether one really needs the full strength of an ultrafilter in applications; to return once again to the voting analogy, one usually does not need to vote on every single conceivable issue (of which there are uncountably many) in order to settle some problem; in practice, there are often only a countable or even finite number of tricky issues which one needs to put to the ultrafilter to decide upon. Because of this, many of the results in soft analysis which are proven using ultrafilters can instead be established using a “poor man’s non-standard analysis” (or “pre-infinitary analysis”) in which one simply does the “voter disenfranchisement” step mentioned above by hand. This step is more commonly referred to as the trick of “passing to a subsequence whenever necessary”, and is particularly popular in the soft analysis approach to PDE and calculus of variations. For instance, to minimise some functional, one might begin with a minimising sequence. This sequence might not converge in any reasonable topology, but it often lies in a sequentially compact set in some weak topology (e.g. by using the sequential version of the Banach-Alaoglu theorem), and so by passing to a subsequence one can force the sequence to converge in this topology. One can continue passing to a subsequence whenever necessary to force more and more types of convergence, and can even diagonalise using the Arzela-Ascoli argument to achieve a countable number of convergences at once (this is of course the sequential Banach-Alaoglu theorem in disguise); in many cases, one gets such a strong convergence that one can then pass to the limit. Most of these types of arguments could also be equivalently performed by selecting an ultrafilter p at the very beginning, and replacing the notions of limit by p-limit throughout; roughly speaking, the ultrafilter has performed all the subsequence-selection for you in advance, and all your sequences in compact spaces will automatically converge without the need to pass to any further subsequences. (For much the same reason, ultrafilters can be used to simplify a lot of infinitary Ramsey theory, as all the pigeonholing has been done for you in advance.) On the other hand, the “by hand” approach of selecting subsequences explicitly tends to be much more constructive (for instance, it can often be performed without any appeal to the axiom of choice), and can also be more easily converted to a quantitative “hard analysis” argument (for instance, by using the finite convergence principle discussed in earlier posts).

As a concrete example from my own experience, in one of my PDE papers with the “I-team” (Colliander, Keel, Staffilani, Takaoka, and myself), we had a rather severe epsilon management problem in our “hard analysis” arguments, requiring in fact seven very different small quantities , with each extremely small compared with the previous one. (As a consequence of this and of our inductive argument, our eventual bounds, while quantitative, were extremely large, requiring a *nine*-fold iterated Knuth arrow!) This epsilon management also led to the paper being unusually lengthy (85 pages). Subsequently, (inspired by a later paper of Kenig and Merle), I learnt how the use of the above “poor man’s non-standard analysis” could conceal almost all of these epsilons (indeed, due to concentration-compactness one can soon pass to a limiting object in which most of the epsilons get sent to zero). Partly because of this, a later paper of myself, Visan, and Zhang on a very similar topic, which adopted this softer approach, was significantly shorter (28 pages, although to be fair this paper also relies on an auxiliary 30-page paper), though to compensate for this it becomes much more difficult to extract any sort of quantitative bound from the argument.

For the purposes of non-standard analysis, one non-principal ultrafilter is much the same as any other. But it turns out that if one wants to perform additive operations on the index set n, then there is a special (and very useful) class of non-principal ultrafilters that one can use, namely the *idempotent ultrafilters*. These ultrafilters p almost recover the shift-invariance property (which, as remarked earlier, cannot be perfectly attained for ultrafilters) in the following sense: for p-almost all h, the ultrafilter p is equal to its translate p+h, or equivalently that for all bounded sequences. (In the probability theory interpretation, in which p-limits are viewed as an expectation, this is analogous to saying that the probability measure associated to p is idempotent under convolution, hence the name). Such ultrafilters can, for instance, be used to give a short proof of Hindman’s theorem, which is otherwise rather unpleasant to prove. There are even more special ultrafilters known as *minimal idempotent ultrafilters*, which are quite useful in infinitary Ramsey theory, but these are now rather technical and I will refer the reader to a survey paper of Bergelson for details. I will note however one amusing feature of these objects; whereas “ordinary” non-principal ultrafilters require an application of Zorn’s lemma (or something similar) to construct them, these more special ultrafilters require *multiple* applications of Zorn’s lemma – i.e. a nested transfinite induction! Thus these objects are truly deep in the “infinitary” end of the finitary-infinitary spectrum of mathematics.

— Non-standard models —

We have now thoroughly discussed non-principal ultrafilters, interpreting them as voting systems which can extract a consistent series of decisions out of a countable number of independent voters. With this we can now discuss non-standard models of a mathematical system. There are a number of ways to build these models, but we shall stick to the most classical (and popular) construction.

Throughout this discussion we fix a single non-principal ultrafilter p. Now we make the following general definition.

Definition. Let X be any set. The ultrapower *X of X is defined to be the collection of all sequences with entries in X, modulo the equivalence that two sequences are considered equal if they agree p-almost surely (i.e. the statement is p-true).

If X is a class of “standard” objects, we shall view *X as the corresponding class of “nonstandard” objects. Thus, for instance, is the class of standard real numbers, and is the class of non-standard real (or hyperreal) numbers, with each non-standard real number being uniquely representable (up to p-almost sure equivalence) as an arbitrary sequence of standard real numbers (not necessarily convergent or even bounded). What one has done here is “democratised” the class X; instead of declaring a single object x in X that everyone has to work with, one allows each voter n in a countable electorate to pick his or her own object arbitrarily, and the voting system p will then be used later to fashion a consensus as to the properties of these objects; this is why we can identify any two sets of voter choices which are p-almost surely identical. We shall abuse notation a little bit and use sequence notation to denote a non-standard element, even though strictly speaking one should deal with equivalence classes of sequences (just like how an element of an space is not, strictly speaking, a single function, but rather an equivalence class of functions that agree almost everywhere).

One can embed any class X of standard objects in its nonstandard counterpart *X, by identifying an element x with the constant sequence ; thus standard objects correspond to unanimous choices of the electorate. This identification is obviously injective. On the other hand, it is rather clear that *X is likely to be significantly larger than X itself.

Any operation or relation on a class (or several classes) of standard objects can be extended to the corresponding class(es) of nonstandard objects, simply by working pointwise on each n separately. For instance, the sum of two non-standard real numbers and is simply ; each voter in the electorate performs the relevant operation (in this case, addition) separately. Note that the fact that these sequences are only defined p-almost surely does not create any ambiguity. Similarly, we say that one non-standard number is less than another , if the statement is p-true. And so forth. There is no direct interaction between different voters (which, in view of the lack of shift invariance, is a good thing); it is only through the voting system p that there is any connection at all between all of the individual voters.

For similar reasons, any property that one can define on a standard object, can also be defined on a non-standard object. For instance, a non-standard integer is prime iff the statement “ is prime” is p-true; a non-standard function is continuous iff the statement “ is continuous” is p-true; and so forth. Basically, if you want to know anything about a non-standard object, go put your question to all the voters, and then feed the answers into the ultrafilter p to get the answer to your question. The properties 1-4 (actually, just 1-3) of the ultrafilter ensure that you will always get a consistent answer out of this.

It is then intuitively obvious that any “simple” property that a class of standard objects has, will be automatically inherited by its nonstandard counterpart. For instance, since addition is associative in the standard real numbers, it will be associative in the non-standard real numbers. Since every non-zero standard real number is invertible in the standard real numbers, so is every non-zero non-standard real number (why?). Because Fermat’s last theorem is true for standard natural numbers, it is true for non-standard natural numbers (why?). And so forth. Now, what exactly does “simple” mean? Roughly speaking, any statement in first-order logic will transfer over from a standard class to a non-standard class, as long as the statement does not itself use p-dependent terms such as “standard” or “non-standard” anywhere. One could state a formal version of this principle here, but I find it easier just to work through examples such as the ones given above to get a sense of why this should be the case.

Now the opposite is also true; any statement in first-order logic, avoiding p-dependent terms such as standard and non-standard, which is true for non-standard classes of objects, is automatically true for standard classes also. This follows just from applying the above principle to the *negation *of the statement one is interested in. Suppose for instance that one has somehow managed to prove the twin prime conjecture for non-standard natural numbers. To see why this then implies the twin prime conjecture for standard natural numbers, we argue by contraduction. If the statement “the twin prime conjecture failed” was true for standard natural numbers, then it would also be true for non-standard natural numbers (it is instructive to work this out explicitly), a contradiction.

That’s the *transfer principle* in a nutshell; informally, everything which avoids p-dependent terminology and which is true in standard mathematics, is also true in non-standard mathematics, and vice versa. Thus the two models are *syntactically* equivalent, even if they are *semantically* rather different. So, if the two models of mathematics are equivalent, why bother working in the latter, which looks much more complicated? It is because in the non-standard model one acquires some additional useful adjectives, such as “standard”. Some of the objects in one’s classes are standard, and others are not. One can use this new adjective (and some others which we will define shortly) to perform manipulations in the non-standard universe which have no obvious counterpart in the standard universe. One can then hope to use those manipulations to eventually end up at a non-trivial new theorem in the standard world, either by arriving at a statement in the non-standard world which no longer uses adjectives such as “standard” and can thus be fed into the transfer principle, or else by using some other principles (such as the overspill principle) to convert a non-standard statement involving p-dependent adjectives into a standard statement. It’s similar to how, say, one can find a real root of a real polynomial by embedding the real numbers in the complex numbers, performing some mathematical manipulations in the complex domain, and then verifying that the complex-valued answer one gets is in fact real-valued.

Let’s give an example of a non-standard number. Let be the non-standard natural number (n), i.e. the sequence (up to p-almost sure equivalence, of course). This number is larger than any standard number; for instance, the standard number 5 corresponds to the sequence ; since n exceeds 5 for all but finitely many values of n, we see that is p-true and hence . More generally, let us say that a non-standard number is *limited* if its magnitude is bounded by a standard number, and *unlimited* otherwise; thus is unlimited. The notion of “limited” is analogous to the notion of being O(1) discussed earlier, but unlike the O() notation, there are no implicit quantifiers that require care to manipulate (though as we shall see shortly, the difficulty has not gone away completely).

One also sees, for instance, that is larger than the sum of and any limited number, that is larger than the product of with any limited number, and so forth. It is also clear that the sum or product of any two limited numbers is limited. The number has magnitude smaller than any positive standard real number and is thus considered to be an *infinitesimal*. Using p-limits, we quickly verify that every limited number x can be uniquely expressed as the sum of a standard number st(x) and an infinitesimal number x-st(x). (The map , by the way, is a homomorphism from the semiring of non-standard positive reals to the tropical semiring , and thus encodes the correspondence principle between ordinary rings and tropical rings.) The set of standard numbers, the set of limited numbers, and the set of infinitesimal numbers are all subrings of the set of all non-standard numbers. A non-zero number is infinitesimal if and only if its reciprocal is unlimited.

Now at this point one might be suspicious that one is beginning to violate some of the axioms of the natural numbers or real numbers, in contradiction to the transfer principle alluded to earlier. For instance, the existence of unlimited non-standard natural numbers seems to contradict the well-ordering property: if one defines to be the set of all unlimited non-standard natural numbers, then this set is non-empty, and so the well-ordering property should then provide a minimal unlimited non-standard number . But then must be unlimited also, a contradiction. What’s the problem here?

The problem here is rather subtle: a set of non-standard natural numbers is not quite the same thing as a non-standard set of natural numbers. In symbols: if denotes the power set of X, then . Let’s look more carefully. What is a non-standard set of natural numbers? This is basically a sequence of sets of natural numbers, one for each voter. Any given non-standard natural number may belong to A or not, depending on whether the statement is p-true or not. We can collect all the non-standard numbers m which do belong in A, and call this set ; this is thus an element of . The map from to turns out to be injective (why? this is the transferred axiom of extensionality), but it is not surjective; there are some sets of non-standard natural numbers which are not non-standard sets of natural numbers, and as such the well-ordering principle, when transferred over from standard mathematics, does not apply to them. This subtlety is all rather confusing at first, but a good rule of thumb is that as long as your set (or function, or whatever) is not defined using p-dependent terminology such as “standard” or “limited”, it will be a non-standard set (or a non-standard function, etc.); otherwise it will merely be a set of non-standard objects (or a function from one non-standard set to another, etc.). [The situation here is similar to that with the adjective “constructive”; not every function from the constructive numbers to the constructive numbers is itself a constructive function, and so forth.]

It is worth comparing the situation here with that with the O() notation. With O(), the axiom schema of specification is simply inapplicable; one cannot form a set using O() notation inside the definition (though I must admit that I have occasionally been guilty of abusing notation and violating the above rule in my own papers). In non-standard analysis, in contrast, one *can *use terminology such as “limited” to create sets of non-standard objects, which then enjoy some useful structure (e.g. the set of limited numbers is a ring). It’s just that these sets are not themselves non-standard, and thus not subject to the transfer principle.

— Example: calculus via infinitesimals —

Historically, one of the original motivations of non-standard analysis was to make rigorous the manipulations of infinitesimals in calculus. While this is not the main focus of my post here, I will give just one small example of how non-standard analysis is applied in differential calculus. If x and y are two non-standard real numbers, with y positive, we write x=o(y) if x/y is infinitesimal. The key lemma is

Lemma 1. Let be a standard function, and let x, L be standard real numbers. We identify f with a non-standard function in the usual manner. Then the following are equivalent:

- f is differentiable at x with derivative f'(x) = L.
- For any infinitesimal h, we have .

Lemma 1 looks very similar to linear Taylor expansion, but note that there are no limits involved (despite the suggestive o() notation); instead, we have the concept of an infinitesimal. The implication of (2.) from (1.) follows easily from the definition of derivative, the transfer principle, and the fact that infinitesimals are smaller in magnitude than any positive standard real number. The implication of (1.) from (2.) can be seen by contradiction; if f is *not* differentiable at x with derivative L, then (by the axiom of choice) there exists a sequence of standard real numbers going to zero, such that the Newton quotient is bounded away from L by a standard positive number. One now forms the non-standard infinitesimal and obtains a contradiction to (2.).

Using this equivalence, one can now readily deduce the usual laws of differential calculus, e.g. the chain rule, product rule, and mean value theorem; the proofs are algebraically almost identical to the usual proofs (especially if one rewrites those proofs in o() notation), but one does not need to deal explicitly with epsilons, deltas, and limits (the ultrafilter has in some sense already done all that for you). The epsilon management is done invisibly and automatically; one does not need to keep track of whether one has to choose epsilon first before selecting delta, or vice versa. In particular, most of the existential quantifiers (“… there exists such that …”) have been eliminated, leaving only the more pleasant universal quantifiers (“for every infinitesimal h …”).

There is one caveat though: Lemma 1 only works when x is *standard*. For instance, consider the standard function , with the convention f(0)=0. This function is everywhere differentiable, and thus extending to non-standard numbers we have for all standard x and infinitesimal h. However, the same claim is not true for arbitrary non-standard x; consider for instance what happens if one sets x=-h.

One can also obtain an analogous characterisation of the Riemann integral: a standard function f is Riemann integrable on an interval [a,b] with integral A if and only if one has

for any non-standard sequence

with infinitesimal. One can then reprove the usual basic results, such as the fundamental theorem of calculus, in this manner; basically, the proofs are the same, but the limits have disappeared, being replaced by infinitesimals.

— Big O() notation —

Big O() notation in standard mathematics can be translated easily into the non-standard setting, as follows.

Lemma 2. Let and be standard functions (which can be identified with non-standard functions in the usual manner). Then the following are equivalent.

- f(m) = O(g(m)) in the standard sense, i.e. there exists a standard positive real constant C such that for all standard natural numbers n.
- |f(m)|/g(m) is limited for every non-standard natural number m.

This lemma is proven similarly to Lemma 1; the implication of (2.) from (1.) is obvious from the transfer principle, while to the implication of (1.) from (2.) is again by contradiction, converting a sequence of increasingly bad counterexamples to (1.) to a counterexample to (2.). Lemma 2 is also a special case of the “overspill principle” in non-standard analysis, which asserts that a non-standard set of numbers which contains arbitrarily large standard numbers, must also contain an unlimited non-standard number (thus the large standard numbers “spill over” to contain some non-standard numbers). The proof of the overspill principle is related to the (specious) argument discussed above in which one tried to derive a contradiction from the set of unlimited natural numbers, and is left as an exercise.

[In some texts, the notation f = O(g) only requires that for all *sufficiently large* m. The nonstandard counterpart to this is the claim that |f(m)|/g(m)| is limited for every *unlimited* non-standard m.]

Because of the above lemma, it is now natural to define the non-standard counterpart of the O() notation: if x, y are non-standard numbers with y positive, we say that x=O(y) if |x|/y is limited. Then the above lemma says that the standard and non-standard O() notations agree for standard functions of one variable. Note how the non-standard version of the O() notation does not have the existential quantifier (“… there exists C such that …”) and so the epsilon management is lessened. If we let denote the subring of consisting of all limited numbers, then the claim x=y+O(z) can be rewritten as , thus we see how the O() notation can be viewed algebraically as the operation of quotienting the (non-standard) real numbers by various dilates of the subring .

One can now convert many other order-of-magnitude notions to non-standard notation. For instance, suppose one is performing some standard hard analysis involving some large parameter , e.g. one might be studying a set of N points in some group or Euclidean space. One often wants to distinguish between quantities which are of polynomial size in N and those which are super-polynomial in size; for instance, these N points might lie in a finite group G, where G has size much larger than N, and one’s application is such that any bound which depends on the size of G will be worthless. Intuitively, the set of quantities which are of polynomial size in N should be closed under addition and multiplication and thus form a sort of subring of the real numbers, though in the standard universe this is difficult to formalise rigorously. But in nonstandard analysis, it is not difficult: we make N non-standard (and G too, in the above example), and declare any non-standard quantity x to be of polynomial size if we have , or equivalently if is limited. We can then legitimately form the set P of all non-standard numbers of polynomial size, and this is in fact a subring of the non-standard real numbers; as before, though, we caution that P is not a non-standard set of reals, and in particular is not a non-standard subring of the reals. Since P *is* a ring, one can then legitimately apply whatever results from ring theory one pleases to P, bearing in mind though that any sets of non-standard objects one generates using that theory may not necessarily be non-standard objects themselves. At the end of the day, we then use the transfer principle to go back to the original problem in which N is standard.

As a specific example of this type of thing from my own experience, in one of my papers with Van Vu, we had a large parameter n, and had at some point to introduce the somewhat fuzzy notion of a “highly rational number”, by which we meant a rational number a/b whose numerator and denominator were both at most in magnitude. Such numbers looked like they were forming a field, since the sum, difference, product, or quotient of two highly rational numbers was again highly rational (but with a slightly different rate of decay in the o() notation). Intuitively, one should be able to do any algebraic manipulation on highly rational numbers which is legitimate for true fields (e.g. using Cramer’s rule to invert a non-singular matrix) and obtain an output which is also highly rational, as long as the number of algebraic operations one uses is O(1) rather than, say, O(n). We could not actually formalise this rigorously in our standard notation, and had to resort to informal English sentences to describe this; but one can do everything perfectly rigorously in the non-standard setting by letting n be non-standard, and defining the field F of non-standard rationals a/b where ; F is genuinely a field of non-standard rationals (but not a non-standard field of rationals), and so using Cramer’s rule here (but only for matrices of standard size) would be perfectly legitimate. (We did not actually write our argument in this non-standard manner, keeping everything in the usual standard hard analysis setting, but it would not have been difficult to rewrite the argument non-standardly, and there would be some modest simplifications.)

— A hierarchy of infinitesimals —

We have seen how, by selecting an ultrafilter p, we can extend the standard real numbers to a larger system , in which the original number system becomes a real totally ordered subfield. (Exercise: is complete? The answer depends on how one defines one’s terms.) This gives us some new objects, such as the infinitesimal given by the sequence . This quantity is smaller than any standard positive number, in particular it is infinitesimally smaller than any quantity depending (via standard operations) on standard constants such as 1. One may think of as the non-standard extension of generated by adjoining ; this is similar to the field extension , but is much larger, because field extensions are only closed under arithmetic operations, whereas non-standard extensions are closed under *all* definable operations. For instance, lies in but not in .

Now it is possible to iterate this process, by introducing an non-standard nonprincipal ultrafilter *p on the non-standard natural numbers (or more precisely, an ultrafilter on that extends an ultraproduct of nonprincipal ultrafilters on ), and then embedding the field inside an even larger system , whose elements can be identified (modulo *p-almost sure equivalence) with non-standard sequences of non-standard numbers in (where n now ranges over the non-standard natural numbers ); one could view these as “doubly non-standard numbers”. This gives us some “even smaller” infinitesimals, such as the “doubly infinitesimal” number given by the non-standard sequence . This quantity is smaller than any standard or (singly) non-standard number, in particular infinitesimally smaller than any positive quantity depending (via standard or singly non-standard operations) on standard or singly non-standard constants such as 1 or . For instance, it is smaller than , where A is the Ackermann function, since the sequence that defines is indexed over the non-standard natural numbers and will drop below for sufficiently large non-standard n.

One can continue in this manner, creating a triply infinitesimal quantity which is infinitesimally smaller than anything depending on 1, , or , and so forth. Indeed one can iterate this construction an absurdly large number of times, though in most applications one only needs an explicitly finite number of elements from this hierarchy. Having this hierarchy of infinitesimals, each one of which is guaranteed to be infinitesimally small compared to *any* quantity formed from the preceding ones, is quite useful: it lets one avoid having to explicitly write a lot of epsilon-management phrases such as “Let be a small number (depending on and ) to be chosen later” and “… assuming was chosen sufficiently small depending on and “, which are very frequent in hard analysis literature, particularly for complex arguments which involve more than one very small or very large quantity. (My I-team paper referred to earlier is of this type.)

— Conclusion —

I hope I have shown that non-standard analysis is not a totally “alien” piece of mathematics, and that it is basically only “one ultrafilter away” from standard analysis. Once one selects an ultrafilter, it is actually relatively easy to swap back and forth from the standard universe and the non-standard one (or to doubly non-standard universes, etc.). This allows one to rigorously manipulate things such as “the set of all small numbers”, or to rigorously say things like “ is smaller than anything that involves “, while greatly reducing epsilon management issues by automatically concealing many of the quantifiers in one’s argument. One has to take care as to which objects are standard, non-standard, sets of non-standard objects, etc., especially when transferring results between the standard and non-standard worlds, but as long as one is clearly aware of the underlying mechanism used to construct the non-standard universe and transfer back and forth (i.e. as long as one understands what an ultrafilter is), one can avoid difficulty. The main drawbacks to use of non-standard notation (apart from the fact that it tends to scare away some of your audience) is that a certain amount of notational setup is required at the beginning, and that the bounds one obtains at the end are rather ineffective (though, of course, one can always, after painful effort, translate a non-standard argument back into a messy but quantitative standard argument if one desires).

I plan to discuss some specific translations of some recent hard analysis results (e.g. the sum-product theory of Bourgain) to the non-standard analysis setting some time in the future, in the spirit of other translations such as van der Dreis and Wilkie’s non-standard proof of Gromov’s theorem on groups of polynomial growth.

## 85 comments

Comments feed for this article

25 June, 2007 at 2:11 pm

TheoThanks for the very nice discussion. I’ve been a fan of ultrafilters since learning about them in high school, although, not being in analysis, I haven’t particularly ever needed to use them.

You suggest that

Because of this, many of the results in soft analysis which are proven using ultrafilters can instead be established using a “poor man’s non-standard analysis” (or “pre-infinitary analysis”) in which one simply does the “voter disenfranchisement” step mentioned above by hand. This step is more commonly referred to as the trick of “passing to a subsequence whenever necessary”, and is particularly popular in the soft analysis approach to PDE and calculus of variations.This observation should not be taken too strongly: in particular, there are sequences and ultrafilters so that the sequence has no large convergent subsequence. To wit:

Let’s work just in the closed interval [0,1], and I will use Q to mean the set of rational numbers in [0,1]. Now take a countable ordering p_j of Q (i.e. a bijection between Q and \N = {1,2,3,…}). For each j, construct a small closed interval A_j = [ p_j – 1/3^(j+1) , p_j + 1/3^(j+1) ]. The A_j don’t quite cover the unit interval, but they do contain an open set that contains all the rational numbers, so they miss at most countable many real numbers — if r_j is the sequence of the missed numbers, then we can put another collection of intervals B_j = [ r_j – 1/3^(j+1) , r_j + 1/3^(j+1) ] around them. There, now together the A_j and the B_j cover the entire unit interval. But notice that the length of each A_j or B_j is only 2/3^(j+1), so any finite union of A_j’s and B_j’s has total length strictly less than 1. In particular, any finite union of these closed intervals necessarily misses an infinite number of rational numbers.

So let’s pick a (possibly different) sequence q_n that, like p_j, exhausts Q, and for each A_j or B_j define a set of rational numbers X_j = {n : q_n\not\in A_j} and Y_j = {n : q_n\not\in B_j}. The X_j’s and Y_j’s are such that any finite intersection is infinite, by the previous paragraph, and so generate a proper nonprinciple filter. Extending this filter, via Zorn, to an ultrafilter gives us the desired ultrafilter: q_n can have no large convergent subsequence, because any convergent subsequence must co-finitely land within some A_j or B_j.

I would also like to ask a question that, not knowing enough analysis or set theory, I’ve been curious about but not spent much time with. You observe that “For the purposes of non-standard analysis, one non-principal ultrafilter is much the same as any other.” A few years ago, after he gave a talk on surreal numbers, I had a brief discussion with John Conway about nonstandard analysis. He observed,

if my memory is correct, that the following statements are equivalent:(1) The continuum hypothesis.

(2) Given any two nonprinciple ultrafilters on \N, there’s a bijection of the natural numbers that takes one of these ultrafilters to the other. (Actually, I think his claim was that continuum hypothesis is equivalent to the statement that any two ultraproduct constructions of the hyperreals are (noncanonically) isomorphic, but I believe this is equivalent?)

So, my question is two-fold. First, is this statement actually correct? Second, is there an easy proof? Neither direction seems particularly obvious to me.

25 June, 2007 at 2:53 pm

Alejandro RiveroHow do you compare the representation of infinitesimals via ultraproducts with more applied (more constructible?) representations such as Alain Connes’s?

25 June, 2007 at 3:22 pm

andrescaicedoHi,

This is about Theo’s comment.

“the following statements are equivalent:

(1) The continuum hypothesis.

(2) Given any two nonprincipal ultrafilters on \N, there’s a bijection of the natural numbers that takes one of these ultrafilters to the other.”

Statement (2) is false, independently of the status of CH. Perhaps Conway was quoting Keisler’s isomorphism theorem? Check Chang-Keisler’s model theory book for the statement of this result (about isomorphism of *ultrapowers* by different ultrafilters). Shelah later eliminated Keisler’s (GCH) assumption, at the cost of requiring ultrafilters on larger objects.

25 June, 2007 at 3:50 pm

Terence TaoDear Theo,

I’m not sure I understand your example. It appears to me that the union of all the A_j is Lebesgue measurable with measure at most 1/3 (if I have summed the geometric series correctly), and so the complement of that union in [0,1] is going to have positive measure and thus be uncountable. Also, if it were true that “all convergent sequences cofinitely land in one of the A_j or B_j”, this would imply that every point in [0,1] lies in the interior of one of the A_j or B_j, which would mean that these interiors are an open cover of [0,1] with no finite subcover, contradicting compactness of [0,1].

It is true though that ultrafilters and the trick of repeated passage to subsequences begin to diverge a bit from each other once one tries to pass to a subsequence an infinite number of times, due to the lack of closure of ultrafilters under countable intersections. It is somewhat of a moot point, though, since once one possesses an ultrafilter, there is no longer any need to consider subsequences at all.

I’m afraid I don’t know enough about the continuum hypothesis to say anything intelligent about Conway’s comment. (Similarly, I know very little about Connes’ approach to non-standard analysis.)

25 June, 2007 at 4:01 pm

Top Posts « WordPress.com[…] Ultrafilters, nonstandard analysis, and epsilon management This post is in some ways an antithesis of my previous postings on hard and soft analysis. In those posts, the emphasis […] […]

25 June, 2007 at 4:54 pm

Filtering out the votes « The Unapologetic Mathematician[…] out the votes Over at his weblog, Terence Tao has an exploration of non-standard analysis in the language of ultrafilters and referring to intuitions about “voting systems”. […]

25 June, 2007 at 6:15 pm

TheoMrph. I dashed down my so-called example in a bit of a hurry, and didn’t carefully check myself. Of course, as pointed out, it fails miserably; this is what I get for being weak on analysis (of any kind). It is true — I’ve seen a proof, four or five years ago, which looked very little like the solution I almost came up with — that there are ultrafilters and bounded sequences so that the sequence has no large convergent subsequence. For now, I guess, I’ll leave that as an exercise to the reader? I’ll certainly continue to try to remember the solution I had been shown.

andrescaicedo, thanks for the references. It’s entirely likely that that’s what Conway actually said; as with the other part of my comment, my memory of this material is poorer than I would like.

Can you provide an example (necessarily, of course, using Choice) of two nonprinciple ultrafilters that are not isomorphic (in the sense of a bijection on \N)? Is there a counting argument?

26 June, 2007 at 4:29 am

Eric“Can you provide an example (necessarily, of course, using Choice) of two nonprinciple ultrafilters that are not isomorphic (in the sense of a bijection on \N)? Is there a counting argument?”

The simplest proof is indeed a counting argument, There are ultrafilters on N and only bijections (or even functions of any kind) from N to N. More generally, any infinite set X has ultrafilters. The idea of the proof is to construct a set S of subsets of N such that any nontrivial finite Boolean combination of them is nonempty (by “nontrivial” I mean “not forced to be empty by the axioms of Boolean algebra”, eg not something like ). This means that for any subset T of S, is a filter base, and extending each such set to an ultrafilter yields distinct ultrafilters for each T. Since there are subsets of S, there must be distinct ultrafilters (and it’s easy to see that there can’t be any more than that).

Re the original post: “One should take this informal argument with a grain of salt; it turns out that after one has made an infinite number of choices, the infinite number of disenfranchised groups, while individually having no further power to influence elections, can begin having some collective power, basically because property 2 of a filter only guarantees closure under finite intersections and not infinite intersections, and things begin to get rather complicated. At this point, I recommend abandoning the informal picture and returning to Zorn’s lemma :-) .”

If you just use transfinite induction instead of Zorn’s lemma, you can make the informal argument work with just a small modification: instead of only splitting up the voters that won the previous split, you one-by-one consider every possible split of the entire set of voters, and choose a winning side that is compatible with the choices you’ve made so far. This, of course, amounts to the exact same thing as the Zorn’s lemma argument, but is a good deal more illuminating.

26 June, 2007 at 8:31 am

Science After SunclipsePhysics VideosI’ve blagged before about physics videos available on the Web. I just heard about SciTalks, a site for gathering and organizing links to videos on scientific and technical topics. This was something I recall Eric and I wishing for just a few wee…

26 June, 2007 at 11:03 am

TheoIn any case, my original point, that an ultrafilter limit is not necessarily the limit of a large subsequence, is easy enough, when I remember how to do it.

Let me consider the following filter on , where is a copy of the natural numbers. I’ll say that a set is large if, for cofinitely many , is cofinite. This generates an ultrafilter via Zorn, and the following two kinds of sets are definitely small in : sets that are cofinitely contained in only one of the ; and sets that have finite intersection with every .

Now partition the unit interval into pieces , , etc. Each of these pieces contains countably many rational numbers, so use whatever ordering of the rationals that you like to identify with . Putting these all together and, via whatever zig-zagging you prefer, identifying , we get an ultrafilter on and a sequence (dense) in .

Then definitely picks a unique limit , and if has any large convergent subsequences, then they definitely converge to . There are two (and a half) possibilities:

(1) If for some , then any large convergent subsequence must be cofinitely in , but all such subsequences are -small.

(1.5) If , then any large convergent subsequence must be cofinitely within , but all such subsequences are small.

(2) If , then any large convergent subsequence has only finitely many points in any interval for , and so certainly has only finitely many points in each and every , and hence is also -small.

27 June, 2007 at 7:56 am

Terence TaoDear Alejandro,

By coincidence, Connes posted on his blog about infinitesimal calculus just last Friday:

http://noncommutativegeometry.blogspot.com/2007/06/infinitesimal-variables.html

From that description, Connes’ “non-standard numbers” are bounded operators on a separable Hilbert space (possibly modulo finite rank operators), and the infinitesimals are the compact operators. If one fixes the Hilbert space as and then restricts to the maximal commutative subalgebra of bounded multiplier operators on that space (i.e. ), then the non-standard numbers are bounded sequences (modulo changes in finitely many entries) and the infinitesimals are sequences converging to zero. This is basically the ultrapower construction above, but with the non-principal ultrafilter replaced by the smaller filter of cofinite sets of natural numbers. So one does not need the axiom of choice for Connes’ construction. On the other hand, since one doesn’t have the full properties of an ultrafilter, not everything in standard mathematics transfers over to the non-standard setting. For instance, one cannot take the reciprocal of an infinitesimal in Connes’ framework. On the other hand, Connes’ system is naturally non-commutative. One cute thing: the Fredholm alternative then becomes the non-commutative counterpart of the assertion that the sum of a non-zero standard number and an infinitesimal is invertible.

27 June, 2007 at 8:11 am

Terence TaoDear Theo,

Thanks for the corrected example. It’s true that if the ultrafilter p (and thus the non-standard model) is fixed in advance, then there will be bounded sequences of real numbers with no p-large convergent subsequences. Of course, the entire sequence will still converge to some limit x in the p-sense, thus for each epsilon there is a p-large subsequence which stays within epsilon of x. This substitute notion of convergence works just as well as the usual notion of convergence in applications.

On the other hand, if in the course of a standard argument one needs to refine a sequence into a subsequence a couple times, then one can choose a non-principal ultrafilter p such that all the subsequences one selected in that argument were large (i.e. in p). So any standard argument involving repeated passage to subsequences can be viewed as “secretly” arising from some non-standard model.

12 July, 2007 at 11:38 pm

Euler’s Nonstandard Nonsense « The Everything Seminar[…] however, I read and understood Terence Tao’s wonderful post on nonstandard analysis using ultrafilters. This has made me very excited about learning […]

17 July, 2007 at 9:17 am

Michael GreineckererI think it is worthmaking explicit that the ultrafilters on a set of voters are actually equivalent to voting rules satisfying the conditions of Arrows theorem (not assuming a finite set of voters) and that any set of voters in such a ultrafilter can enforce everything they unanymously want. This was independently proven in:

1972 A.P. Kirman and D. Sondermann, “Arrow’s theorem, many agents, and invisible dictators”, Journal of Economic Theory 5 (1972), pp. 267–277

1976 B. Hansson, “The existentce of group preference functions”, Public Choice 28 (1976), pp. 89–98

15 August, 2007 at 1:30 pm

Death to triangles! (and life to ultrafilters…) « Mathemusicality[…] major interests is nonstandard analysis, the very concept of which I used to despise until I read this brilliant post by Terence Tao. If a Fields Medalist says it’s okay, then it must be okay! Actually, what […]

23 August, 2007 at 1:04 am

Topology and ultrafilters, I « Mort aux Triangles![…] topology and background nbornak 9:04 am As much as I’d like to continue the current program of rediscovering model theory in a more categorical setting, I’d like to start a second thread about topology. The final goal of this thread will be to set forth the existence (given the axiom of choice) of non-principal ultrafilters. These are both the condition for the possibility of non-standard analysis and at the same time a reason for marginalizing it, as has already been chronicled by Terence Tao in his superb blog. […]

27 August, 2007 at 7:53 am

Printer-friendly CSS, and nonfirstorderisability « What’s new[…] quite show that (*) is not expressible in first-order logic, but I can come very close, using non-standard analysis. The […]

28 August, 2007 at 7:10 pm

Non-nonstandard Calculus, I « The Everything Seminar[…] came from taking several analysis classes at Smith College from logician Jim Henle. I learned nonstandard analysis from him one summer, but also learned what he called “non-nonstandard analysis”. The […]

14 November, 2007 at 4:00 am

NSA Bibliography « Mort aux Triangles![…] Terrence Tao’s epic blog post on NSA and epsilon-management. […]

13 January, 2008 at 6:58 pm

254A, Lecture 3: Minimal dynamical systems, recurrence, and the Stone-Cech compactification « What’s new[…] the underlying group that acts on topological dynamical systems. By doing so, all the epsilon management issues go away, and the subject becomes very algebraic in nature. On the other hand, some […]

2 February, 2008 at 8:32 am

wb“Can you provide an example (necessarily, of course, using Choice) of two nonprinciple ultrafilters that are not isomorphic (in the sense of a bijection on \N)?

For one sense of “provide an example” the answer is

certainly “no”.

It is known that there is no formula phi in the language of

set theory such that ZFC proves that the extension of

phi is a non-principle ultrafilter on the set of natural numbers.

8 March, 2008 at 10:25 pm

liuxiaochuanDear professor：

It’s just a great classroom on Internet! Thank you so much.

A tiny correction:

The third paragraph after the four laws of limit, the second sentence: …the ordinary notion of a limit, because “if” a sequence x_n is convergent…

yours

Liu

9 March, 2008 at 9:08 am

Terence TaoDear Liu: thanks for the correction!

27 April, 2016 at 5:56 am

sagarDear Sir , i want to do ph.d in non standard analysis. can you tell me some standard books on non standard analysis

27 April, 2016 at 8:44 am

Terence TaoI found Goldblatt’s “lectures on the hyperreals” to be a good introduction.

29 March, 2008 at 12:05 am

mishaA while ago I had written a short section for Wikipedia, in which I tried to explain as simply as possible the ultrafilter construction of hyperreals and the non-standard approach to differentiation. I had put an extra effort into making the notion of an ultrafilter as motivated and palatable as possible. Maybe some people will find it helpful.

27 May, 2008 at 7:09 pm

285G, Lecture 15: Geometric limits of Ricci flows, and asymptotic gradient shrinking solitons « What’s new[…] 1. One could use ultrafilters here in place of subsequences, but this does not significantly affect any of the […]

2 June, 2008 at 10:10 am

Global regularity of wave maps III. Large energy from R^{1+2} to hyperbolic spaces « What’s new[…] the 2006 paper of Kenig and Merle mentioned earlier managed to eliminate a lot of tedious “epsilon management” from the arguments of Colliander et al. (though for a slightly different problem), by using […]

25 July, 2008 at 7:49 am

美丽的数 « Liuxiaochuan’s Weblog[…] 数的出现：就像可以想到的，定义,而相应的，.同样的，。这个数很有意思，事实上，它比任何一个正实数都小。当年牛顿在微积分中引入无穷小量，解决了物理中大量问题，可是这个方法的数学解释却不能令人容忍。后来引入的语言让数学家们足够满意，就告了一个段落。而再后来，有人重新来看牛顿的无穷小量，用方才的方法去严格加以定义，使得“非典型分析”重回历史舞台。这个方法的优点是避开了繁琐的语言，为数学论文的简化可以做出贡献（当然不能牺牲严格性）。这个过程够精彩的，陶哲轩为此还写过一篇文章。 […]

30 August, 2008 at 8:43 am

The correspondence principle and finitary ergodic theory « What’s new[…] weak sense by an infinite system. (One can make this informal statement more rigorous using nonstandard analysis and/or ultrafilters, but we will not take such an approach here.) Because of this, we obtain a correspondence […]

29 September, 2008 at 8:19 am

What is a gauge? « What’s new[…] infinitesimals such as dx. (There are ways to make the use of infinitesimals rigorous, such as non-standard analysis, but this is not the focus of my post […]

14 October, 2008 at 4:23 pm

Non-measurable sets via non-standard analysis « What’s new[…] real number, as the real numbers are not discrete. If however one employs the language of non-standard analysis, then it is possible to make the above argument rigorous, and this is the purpose of my post today. […]

27 November, 2008 at 4:28 am

什么是ultrafilter « Liu Xiaochuan’s Weblog[…] 陶哲轩在博客上的一个帖子对ultrafilter的存在性有一个更有趣的解释，而且还对两种不同的情况有个对比。他称此为“无穷选举”。 […]

8 January, 2009 at 9:50 am

245B, notes 2: Amenability, the ping-pong lemma, and the Banach-Tarski paradox (optional) « What’s new[…] can even create a finitely additive probability measure, for instance by selecting a non-principal ultrafilter and a Følner sequence and defining for all […]

9 January, 2009 at 4:20 pm

Nate ChandlerOne small correction: Under “algebra homomorphism” within the section describing the laws that limits behave, you wrote ; you meant .

9 January, 2009 at 4:47 pm

Terence TaoThanks for the correction!

26 January, 2009 at 9:35 am

245B, Notes 6: Duality and the Hahn-Banach theorem « What’s new[…] correspondence between the space of generalised limit functionals, and the space of non-principal ultrafilters on , as defined for instance in this blog post. (This exercise may become easier once one is […]

9 February, 2009 at 11:04 am

245B, Notes 10: Compactness in topological spaces « What’s new[…] Ultrafilters, non-standard analysis, and epsilon management […]

18 March, 2009 at 10:32 pm

245B, Notes 13: Compactification and metrisation (optional) « What’s new[…] with the space of ultrafilters on . (See this post for further discussion of ultrafilters, and this post for further discussion of the relationship of […]

23 March, 2009 at 10:20 am

Victor PortonI have discovered an other (that non-standard analysis) approach of analysis without using epsilon-delta notion in my “Algebraic General Topology”:

http://www.mathematics21.org/algebraic-general-topology.html

Particularly I define continuity by an algebraic formula hiding epsilon-delta behind the algebra.

Recently I also invented limit of arbitrary (not necessarily continuous) function. I put a preliminary idea of this on my site along with other “Algebraic General Topology” articles.

30 March, 2009 at 6:14 am

Halmos, Non-Standard Analysis and Names « Gödel’s Lost Letter and P=NP[…] One story that you may like concerns the notion of non-standard analysis. This is an area created by Abraham Robinson to make precise the notion of infinitesimals and infinity that Newton and others used in developing calculus. We have long changed over from this to the and precision of Karl Weierstrass. Sometimes the burden of keeping track of many and can become overwheling. See Terry Tao’s post for a modern view of this. […]

15 October, 2009 at 7:50 pm

Reading seminar: “Stable group theory and approximate subgroups”‘, by Ehud Hrushovski « What’s new[…] instance, in the language of rings, the non-standard reals are an elementary extension of the standard reals. On the other hand, the reals are not an […]

14 December, 2009 at 3:37 pm

Approximate bases, sunflowers, and nonstandard analysis « What’s new[…] as . In this note I will show how this equivalence can be made formal using the language of non-standard analysis. This is not a particularly deep (or new) observation, but it is perhaps the simplest example I […]

18 January, 2010 at 3:00 am

Banach Limites « UGroh's Weblog[…] diesem Abschnitt lassen wir uns von dem Blog von Terence Tao und Chapitre II, §3 in dem Buch von Stefan Banach [2] unter Nutzung von [2] §36, Aufgabe 4 […]

18 January, 2010 at 11:49 am

Jonathan Vos PostThere was a recent arXiv paper whose title and abstract looked exciting, in applying nonstandard analysis and ultraf8lters to Physics. But, alas, I don’t think the author knows enough Physics. Still, it is a good reminder not to make unstated assumptions in Physics about the Real Line being all that matters.

http://arxiv.org/abs/0911.4824

Surprising Properties of Non-Archimedean Field Extensions of the Real Numbers

Authors: Elemer E. Rosinger

Comments: This, under the present form, is a replacement that is a two

part paper in which the new second part was brought together with my

recently posted arxiv paper, upon the suggestion of the arxiv

moderators

Subjects: General Mathematics (math.GM)

30 January, 2010 at 7:07 pm

Ultralimit analysis, and quantitative algebraic geometry « What’s new[…] nonstandard analysis, ultrafilters, ultralimit analysis | by Terence Tao I have blogged a number of times in the past about the relationship between finitary (or “hard”, or […]

13 March, 2010 at 5:06 pm

Terry Tao on Ultrafilters, and nonstandard analysis | Models Of Reality[…] Back in 2007 Terry Tao (Fields medalist Mathematician at UCLA) wrote a blog article on Ultrafilters, nonstandard analysis, and epsilon management. […]

19 March, 2010 at 9:53 pm

A computational perspective on set theory « What’s new[…] (involving only finitely many points or numbers, etc.); I have explored this theme in a number of previous blog posts. So, one may ask: what is the finitary analogue of statements such as Cantor’s […]

29 March, 2010 at 3:51 pm

aram harrowI have a dumb question. Non-principle ultrafilters require the axiom of choice (almost). But arguments like “choose delta_3>0 to be a small constant depending on delta_1 and delta_2” obviously do not.

So is there some strange statement involving deltas and epsilons that you can do with ultrafilters, but not easily simply using strings of \exists and \forall signs?

29 March, 2010 at 6:50 pm

Terence TaoIt basically depends on how often one appeals to the properties of an ultrafilter. An ultrafilter has various closure and maximality properties which involve arbitrary subsets of the natural numbers – and by Cantor’s theorem, there are uncountably many such subsets. However, when one inspects any given argument using an ultrafilter , one finds that one does not need the ultrafilter properties for

allsubsets of the natural numbers, but typically just finitely many or countably many of these sets would suffice. As such, many arguments involving ultrafilters could also be run using an “incomplete ultrafilter”, which obeys the ultrafilter axioms for some of the subsets of the natural numbers but not for others. (This is similar in spirit to the disenfranchised voter model in the blog post – one does not decide whether a given set ought to belong in the ultrafilter or not until one’s hand is forced, at which point one makes an arbitrary choice.) This already allows one to reduce the amount of choice needed, to something like countable choice or dependent choice. In many cases, one can reduce the amount of choice further, because for the specific sets which one has to choose for the ultrafilter, there is some way to make a canonical choice. Once one has expunged all but finitely many choices, then one should be able to convert what remains of the ultrafilter into a finite chain of quantifiers.But there are probably arguments that use the ultrafilter so heavily that one cannot easily disentangle the proof into a finitary, first order proof that completely avoids the axiom of choice. Arguments that use idempotent ultrafilters are probably good candidates for this, as one needs a couple further applications of Zorn’s lemma to get the idempotency. Nevertheless even these arguments can be partially finitised to some extent; there are some recent papers of Henry Towsner achieving this for Hindman’s theorem (which is most easily proven using idempotent ultrafilters).

30 January, 2018 at 2:59 pm

RexThis is true in a lot of problems. For instance Blackwell and Diaconis (1996) [argument in the paper is older than this version] existence of non-measurable set in coin-toss space, assumes a free ultrafilter but uses only some implications of it.

In general for a class of subsets on natural numbers, if all properties of a free ultrafilter are met (inclusion of supersets, maximal, empty intersection of all sets, inclusion of cofinite sets) except that “closed under intersection” is replaced by “closed under intersection of A in the class and any cofinite set” must this class necessarily be a free ultrafilter? If not, are there examples of such incomplete ultrafilters that are not free ultrafilters in the standard sense.

30 March, 2010 at 11:56 pm

Bo JacobyI enjoy your writing. The sentence “the implication of (2.) from (1.) is obvious from the transfer principle, while to the implication of (2.) from (1.) is again by contradiction,” must be an error though.

31 March, 2010 at 5:12 am

Simon HickeyDear Prof. Tao,

I am doing an essay on you for the History of Mathematics at the University of Limerick, Ireland. I was wondering would you be able to inform me of the reason behind these blogs and how successful they have been to date.

Thanking you,

Simon

31 March, 2010 at 3:51 pm

alexI love the really vivid way you described all this stuff, I’ve been struggling to understand ultrafilters for a long time – but the ideas you’ve written about choice (as oracles that update as you query them) and voting make it seem so simple and clear! Thank you!

29 April, 2010 at 11:39 am

jonasTo wb: I suspect it might be possible to prove the existence of two non-isomorphic non-principial ultrafilters sort of “explicitly” by giving two filters explicitly and proving that they can’t be subsets of isomorphic ultrafilters.

17 November, 2010 at 7:17 am

Arbeitsblatt 5: Konvergenz in topologischen Räumen « UGroh's Weblog[…] In diesem Abschnitt haben wir wir uns von dem Blog von Terence Tao und Chapitre II, §3 in dem Buch von Stefan Banach, Théorie des Operations Linéaires (1932) unter […]

19 November, 2010 at 1:13 am

Theo BuehlerI like the following proof of the existence of a non-principal ultrafilter (or rather a Banach-limit) via Hahn-Banach:

Look at the subspace of generated by the sequences of the form , where . The constant sequence has distance 1 from so that by Hahn-Banach there is a linear functional of norm one such that and vanishes on $U$.

By construction, is monotone, shift-invariant and restricts to the usual limit on convergent sequences (this is because contains the space of sequences that converge to zero). Of course, is not multiplicative.

22 November, 2010 at 5:57 pm

Boolean rings, ultrafilters, and Stone’s representation theorem « Annoying Precision[…] Tao has already written a great introduction to ultrafilters with an eye towards nonstandard analysis. I’d like to introduce them from a different […]

27 November, 2010 at 12:07 am

Nonstandard analysis as a completion of standard analysis « What’s new[…] this post, we fix a single non-principal ultrafilter on the (standard) natural numbers . (See this previous blog post for some basic discussion of what non-principal ultrafilters are, and how they are used in […]

29 November, 2010 at 12:00 pm

Concentration compactness via nonstandard analysis « What’s new[…] basics of nonstandard analysis are reviewed in this previous blog post (and see also this later post on ultralimit analysis, as well as the most recent post on this […]

9 December, 2010 at 7:57 pm

Ultrafilters in topology « Annoying Precision[…] his post on ultrafilters, Terence Tao also uses the closely related “voting” intuition. Here we only consider […]

9 April, 2011 at 11:19 am

Non-standard analysis link round-up « Matthew’s Math Blog[…] 3. Two posts on Terry Tao’s blog: [1] [2]. […]

13 August, 2011 at 11:18 am

AnonymousCAN THE USE OF NON STANDARD ANALYSIS PROVIDE A MATHAMATICAL TOOL WHICH CAN DEVELOPE A ” GENERAL MODEL OF EVERYTHING” WHICH DOES NOT REQUIRE A MULTIPLCITY OF DIMENSIONS

21 August, 2011 at 12:45 pm

Hilbert’s seventh problem, and powers of 2 and 3 « What’s new[…] to make the close to ); however, in order to simplify the exposition (and in particular, to use some nonstandard analysis terminology to reduce the epsilon management), I will establish Proposition 6 with ineffective constants […]

26 August, 2011 at 11:44 pm

AnonymousHi Terry,

In Lemma 1 condition 2, should it be f(x+h)=f(x)+hL+o(|h|) instead?

best,

Yuncheng

[Corrected, thanks – T.]27 August, 2011 at 7:26 am

AnonymousDear Professor Terry: This may be a question due to my minunderstanding of something. In the second paragraph of ‘A hierarchy of infinitesimals’, in the last sentence it is claimed that since with drop below for sufficiently large non-standard , therefore is smaller than any standard or singly standard numbers. Does it follow from the fact that any subset of of the form for some is -almost sure? If so, how to prove it is true?

27 August, 2011 at 8:16 am

Terence TaoOops, I was a bit sloppy when I defined *p simply as a “nonstandard ultrafilter” on . More precisely, *p needs to be an ultrafilter on that extends an ultraproduct of a sequence of standard nonprincipal ultrafilters on . That way, since each standard nonprincipal ultrafilter contains every standard cofinite subset of , *p will contain any nonstandardly cofinite subset of , and in particular will contain for any nonstandard natural m.

24 September, 2011 at 11:46 pm

AnonymousI’m confused as to why is meaningless. I’ve never formally studies logic, but as far as I can tell the condition for a real valued function to be O(1) can be expressed as a predicate in first order logic. What is the issue with the axiom schema of specification when using O() notation to define a set?

25 September, 2011 at 7:14 pm

Terence Taox is a single real number, not a function.

15 October, 2011 at 10:58 am

254A, Notes 6: Ultraproducts as a bridge between hard analysis and soft analysis « What’s new[…] Logical limits are closely tied with non-standard analysis. Indeed, by applying an ultraproduct construction to standard number systems such as the natural numbers or the reals , one can obtain nonstandard number systems such as the nonstandard natural numbers or the nonstandard real numbers (or hyperreals) . These nonstandard number systems behave very similarly to their standard counterparts, but also enjoy the advantage of containing the standard number systems as proper subsystems (e.g. is a subring of ), which allows for some convenient algebraic manipulations (such as the quotient space construction to create spaces such as ) which are not easily accessible in the purely standard universe. Nonstandard spaces also enjoy a useful completeness property, known as countable saturation, which is analogous to metric completeness (as discussed in this previous blog post) and which will be particularly useful for us in tying together the theory of approximate groups with the theory of Hilbert’s fifth problem. See this previous post for more discussion on ultrafilters and nonstandard analysis. […]

19 February, 2012 at 1:50 pm

BI: nonstandard analysis, a small investment | Alex Sisto[…] nonstandard analysis in a slightly more famous blog than my own, for example you can check out this post. I guess I should explain the very effective quote “a small investment” by Isaac […]

23 March, 2012 at 8:58 am

Some ingredients in Szemerédi’s proof of Szemerédi’s theorem « What’s new[…] are far larger than any quantity that appears in the preceding universe, as discussed at the end of this previous blog post. This sequence of universes does end up concealing all the epsilons, but it is not so clear that […]

9 April, 2012 at 4:00 am

sıcak videolarlike the following proof of the existence of a non-principal ultrafilter (or rather a Banach-limit) via Hahn-Banach:

25 October, 2012 at 10:11 am

Walsh’s ergodic theorem, metastability, and external Cauchy convergence « What’s new[…] will assume some familiarity with nonstandard analysis, as covered for instance in these previous blog […]

22 November, 2012 at 7:11 am

Mikhail KatzEarlier on this page there was a discussion of Connes’ critique of NSA. The latter was recently analyzed by Kanovei, Katz, and Mormann in this article in Foundations of Science (see also [arXiv 1211.0244](http://arxiv.org/abs/1211.0244)). Here is the abstract:

> We examine some of Connes’ criticisms of Robinson’s infinitesimals starting in 1995. Connes sought to exploit the Solovay model S as ammunition against non-standard analysis, but the model tends to boomerang, undercutting Connes’ own earlier work in functional analysis. Connes described the hyperreals as both a ‘virtual theory’ and a ‘chimera’, yet acknowledged that his argument relies on the transfer principle. We analyze Connes ‘dart-throwing’ thought experiment, but reach an opposite conclusion. In S, all definable sets of reals are Lebesgue measurable, suggesting that Connes views a theory as being ‘virtual’ if it is not definable in a suitable model of ZFC. If so, Connes’ claim that a theory of the hyperreals is ‘virtual’ is refuted by the existence of a definable model of the hyperreal field due to Kanovei and Shelah. Free ultrafilters aren’t definable, yet Connes exploited such ultrafilters both in his own earlier work on the classification of factors in the 1970s and 80s, and in Noncommutative Geometry, raising the question whether the latter may not be vulnerable to Connes’ criticism of virtuality.

> We analyze the philosophical underpinnings of Connes’ argument based on Goedel’s incompleteness theorem, and detect an apparent circularity in Connes’ logic. We document the reliance on non-constructive foundational material, and specifically on the Dixmier trace (featured on the front cover of Connes’ magnum opus) and the Hahn-Banach theorem, in Connes’ own framework. We also note an inaccuracy in Machover’s critique of infinitesimal-based pedagogy.

A brief review of Kanovei-Shelah is here. This analysis of Connes’ critique is being discussed here.

10 May, 2013 at 10:05 pm

Bird’s-eye views of Structure and Randomness (Series) | Abstract Art[…] Infinities as numbers: purging the epsilons and deltas from proofs (adapted from “Ultrafilters, nonstandard analysis, and epsilon management“) […]

19 June, 2013 at 7:16 am

Infinities as numbers: purging the epsilons and deltas from proofs | Abstract Art[…] [1.5] Ultrafilters, non-standard analysis, and epsilon management […]

4 July, 2013 at 10:05 am

What does Kadison-Singer have to do with Quantum Mechanics? | tcs math - some mathematics of theoretical computer science[…] identified with , the Stone-Cech compactification of the naturals. See Terry Tao’s notes on ultafilters and Stone-Cech […]

7 December, 2013 at 4:06 pm

Ultraproducts as a Bridge Between Discrete and Continuous Analysis | What's new[…] has as a preferred candidate, and the ultrafilter is used as the election protocol. (See this previous post for further discussion of this […]

5 May, 2014 at 3:36 pm

Hindman’s Theorem | Finite Playground[…] reading the posts on ultrafilters by Terry Tao [1,2], I find the following result […]

5 March, 2015 at 6:08 pm

Ultrafilters – I. the Stone-Čech compactification | the capacity to be alone[…] Tao’s fantastic voting analogy for ultrafilters is one of my favorite intuitions I’ve ever read. After sitting through a lecture of said […]

20 July, 2015 at 8:23 pm

A nonstandard analysis proof of Szemeredi’s theorem | What's new[…] extension” component of the argument). But the proof was still quite messy. However, as discussed in this previous blog post, messy finitary proofs can often be cleaned up using nonstandard analysis. Thus, there should be a […]

15 November, 2015 at 3:04 pm

What is Gauge? | The Conscience of A Libertarian[…] infinitesimals such as dx. (There are ways to make the use of infinitesimals rigorous, such as non-standard analysis, but this is not the focus of my post […]

11 March, 2016 at 8:55 am

Sumfree sets in groups | What's new[…] of the problem in order to keep the complexity of the argument at a reasonable level (cf. my previous blog post on this topic). One drawback of doing so is that we have no effective bounds for the implied […]

17 April, 2016 at 3:18 pm

sagheIn this link we can see a new approach to nonstandard analysis without using the model theory.

https://hal.inria.fr/hal-01248379/