You are currently browsing the category archive for the ‘math.NT’ category.
One of the basic problems in analytic number theory is to estimate sums of the form
as , where
ranges over primes and
is some explicit function of interest (e.g. a linear phase function
for some real number
). This is essentially the same task as obtaining estimates on the sum
where is the von Mangoldt function. If
is bounded,
, then from the prime number theorem one has the trivial bound
but often (when is somehow “oscillatory” in nature) one is seeking the refinement
Thanks to identities such as
is the Möbius function, refinements such as (1) are similar in spirit to estimates of the form
before one can use identities such as (3) to recover (1). Still, one generally thinks of (1) and (4) as being “morally” equivalent, even if they are not formally equivalent.
When is oscillating in a sufficiently “irrational” way, then one standard way to proceed is the method of Type I and Type II sums, which uses truncated versions of divisor identities such as (3) to expand out either (1) or (4) into linear (Type I) or bilinear sums (Type II) with which one can exploit the oscillation of
. For instance, Vaughan’s identity lets one rewrite the sum in (1) as the sum of the Type I sum
the Type I sum
the Type II sum
and the error term , whenever
are parameters, and
are the sequences
and
Similarly one can express (4) as the Type I sum
the Type II sum
and the error term , whenever
with
, and
is the sequence
After eliminating troublesome sequences such as via Cauchy-Schwarz or the triangle inequality, one is then faced with the task of estimating Type I sums such as
or Type II sums such as
for various . Here, the trivial bound is
, but due to a number of logarithmic inefficiencies in the above method, one has to obtain bounds that are more like
for some constant
(e.g.
) in order to end up with an asymptotic such as (1) or (4).
However, in a recent paper of Bourgain, Sarnak, and Ziegler, it was observed that as long as one is only seeking the Mobius orthogonality (4) rather than the von Mangoldt orthogonality (1), one can avoid losing any logarithmic factors, and rely purely on qualitative equidistribution properties of . A special case of their orthogonality criterion (which actually dates back to an earlier paper of Katai, as was pointed out to me by Nikos Frantzikinakis) is as follows:
Proposition 1 (Orthogonality criterion) Let
be a bounded function such that
for any distinct primes
(where the decay rate of the error term
may depend on
and
). Then
Actually, the Bourgain-Sarnak-Ziegler paper establishes a more quantitative version of this proposition, in which can be replaced by an arbitrary bounded multiplicative function, but we will content ourselves with the above weaker special case. (See also these notes of Harper, which uses the Katai argument to give a slightly weaker quantitative bound in the same spirit.) This criterion can be viewed as a multiplicative variant of the classical van der Corput lemma, which in our notation asserts that
if one has
for each fixed non-zero
.
As a sample application, Proposition 1 easily gives a proof of the asymptotic
for any irrational . (For rational
, this is a little trickier, as it is basically equivalent to the prime number theorem in arithmetic progressions.) The paper of Bourgain, Sarnak, and Ziegler also apply this criterion to nilsequences (obtaining a quick proof of a qualitative version of a result of Ben Green and myself, see these notes of Ziegler for details) and to horocycle flows (for which no Möbius orthogonality result was previously known).
Informally, the connection between (5) and (6) comes from the multiplicative nature of the Möbius function. If (6) failed, then exhibits strong correlation with
; by change of variables, we then expect
to correlate with
and
to correlate with
, for “typical”
at least. On the other hand, since
is multiplicative,
exhibits strong correlation with
. Putting all this together (and pretending correlation is transitive), this would give the claim (in the contrapositive). Of course, correlation is not quite transitive, but it turns out that one can use the Cauchy-Schwarz inequality as a substitute for transitivity of correlation in this case.
I will give a proof of Proposition 1 below the fold (which is not quite based on the argument in the above mentioned paper, but on a variant of that argument communicated to me by Tamar Ziegler, and also independently discovered by Adam Harper). The main idea is to exploit the following observation: if is a “large” but finite set of primes (in the sense that the sum
is large), then for a typical large number
(much larger than the elements of
), the number of primes in
that divide
is pretty close to
:
In particular, one can sum (7) against and obtain an approximation
that approximates a sum of by a bunch of sparser sums of
. Since
we see (heuristically, at least) that in order to establish (4), it would suffice to establish the sparser estimates
for all (or at least for “most”
).
Now we make the change of variables . As the Möbius function is multiplicative, we usually have
. (There is an exception when
is divisible by
, but this will be a rare event and we will be able to ignore it.) So it should suffice to show that
for most . However, by the hypothesis (5), the sequences
are asymptotically orthogonal as
varies, and this claim will then follow from a Cauchy-Schwarz argument.
One of the most notorious problems in elementary mathematics that remains unsolved is the Collatz conjecture, concerning the function defined by setting
when
is odd, and
when
is even. (Here,
is understood to be the positive natural numbers
.)
Conjecture 1 (Collatz conjecture) For any given natural number
, the orbit
passes through
(i.e.
for some
).
Open questions with this level of notoriety can lead to what Richard Lipton calls “mathematical diseases” (and what I termed an unhealthy amount of obsession on a single famous problem). (See also this xkcd comic regarding the Collatz conjecture.) As such, most practicing mathematicians tend to spend the majority of their time on more productive research areas that are only just beyond the range of current techniques. Nevertheless, it can still be diverting to spend a day or two each year on these sorts of questions, before returning to other matters; so I recently had a go at the problem. Needless to say, I didn’t solve the problem, but I have a better appreciation of why the conjecture is (a) plausible, and (b) unlikely be proven by current technology, and I thought I would share what I had found out here on this blog.
Let me begin with some very well known facts. If is odd, then
is even, and so
. Because of this, one could replace
by the function
, defined by
when
is odd, and
when
is even, and obtain an equivalent conjecture. Now we see that if one chooses
“at random”, in the sense that it is odd with probability
and even with probability
, then
increases
by a factor of roughly
half the time, and decreases it by a factor of
half the time. Furthermore, if
is uniformly distributed modulo
, one easily verifies that
is uniformly distributed modulo
, and so
should be roughly
times as large as
half the time, and roughly
times as large as
the other half of the time. Continuing this at a heuristic level, we expect generically that
half the time, and
the other half of the time. The logarithm
of this orbit can then be modeled heuristically by a random walk with steps
and
occuring with equal probability. The expectation
is negative, and so (by the classic gambler’s ruin) we expect the orbit to decrease over the long term. This can be viewed as heuristic justification of the Collatz conjecture, at least in the “average case” scenario in which is chosen uniform at random (e.g. in some large interval
). (It also suggests that if one modifies the problem, e.g. by replacing
to
, then one can obtain orbits that tend to increase over time, and indeed numerically for this variant one sees orbits that appear to escape to infinity.) Unfortunately, one can only rigorously keep the orbit uniformly distributed modulo
for time about
or so; after that, the system is too complicated for naive methods to control at anything other than a heuristic level.
Remark 1 One can obtain a rigorous analogue of the above arguments by extending
from the integers
to the
-adics
. This compact abelian group comes with a Haar probability measure, and one can verify that this measure is invariant with respect to
; with a bit more effort one can verify that it is ergodic. This suggests the introduction of ergodic theory methods. For instance, using the pointwise ergodic theorem, we see that if
is a random
-adic integer, then almost surely the orbit
will be even half the time and odd half the time asymptotically, thus supporting the above heuristics. Unfortunately, this does not directly tell us much about the dynamics on
, as this is a measure zero subset of
. More generally, unless a dynamical system is somehow “polynomial”, “nilpotent”, or “unipotent” in nature, the current state of ergodic theory is usually only able to say something meaningful about generic orbits, but not about all orbits. For instance, the very simple system
on the unit circle
is well understood from ergodic theory (in particular, almost all orbits will be uniformly distributed), but the orbit of a specific point, e.g.
, is still nearly impossible to understand (this particular problem being equivalent to the notorious unsolved question of whether the digits of
are uniformly distributed).
The above heuristic argument only suggests decreasing orbits for almost all (though even this remains unproven, the state of the art is that the number of
in
that eventually go to
is
, a result of Krasikov and Lagarias). It leaves open the possibility of some very rare exceptional
for which the orbit goes to infinity, or gets trapped in a periodic loop. Since the only loop that
lies in is
(for
) or
(for
), we thus may isolate a weaker consequence of the Collatz conjecture:
Conjecture 2 (Weak Collatz conjecture) Suppose that
is a natural number such that
for some
. Then
is equal to
,
, or
.
Of course, we may replace with
(and delete “
“) and obtain an equivalent conjecture.
This weaker version of the Collatz conjecture is also unproven. However, it was observed by Bohm and Sontacchi that this weak conjecture is equivalent to a divisibility problem involving powers of and
:
Conjecture 3 (Reformulated weak Collatz conjecture) There does not exist
and integers
such that
is a positive integer that is a proper divisor of
Proof: To see this, it is convenient to reformulate Conjecture 2 slightly. Define an equivalence relation on
by declaring
if
for some integer
, thus giving rise to the quotient space
of equivalence classes
(which can be placed, if one wishes, in one-to-one correspondence with the odd natural numbers). We can then define a function
by declaring
, where
is the largest power of
that divides
. It is easy to see that
is well-defined (it is essentially the Syracuse function, after identifying
with the odd natural numbers), and that periodic orbits of
correspond to periodic orbits of
or
. Thus, Conjecture 2 is equivalent to the conjecture that
is the only periodic orbit of
.
Now suppose that Conjecture 2 failed, thus there exists such that
for some
. Without loss of generality we may take
to be odd, then
. It is easy to see that
is the only fixed point of
, and so
. An easy induction using (2) shows that
where, for each ,
is the largest power of
that divides
is odd,
. Using the recursion
divides
, and thus
:
Since , we have
for some integer . Since
is divisible by
, and
is odd, we conclude
; if we rearrange the above equation as (1), then we obtain a counterexample to Conjecture 3.
Conversely, suppose that Conjecture 3 failed. Then we have , integers
and a natural number such that (1) holds. As
, we see that the right-hand side of (1) is odd, so
is odd also. If we then introduce the natural numbers
by the formula (3), then an easy induction using (4) shows that
for
. As the
are increasing in
(even for
), we see that
is the largest power of
that divides the right-hand side of (5); as
is odd, we conclude that
is also the largest power of
that divides
. We conclude that
and thus is a periodic orbit of
. Since
is an odd number larger than
, this contradicts Conjecture 3.
Call a counterexample a tuple that contradicts Conjecture 3, i.e. an integer
and an increasing set of integers
such that (1) holds for some . We record a simple bound on such counterexamples, due to Terras and to Garner :
Lemma 5 (Exponent bounds) Let
, and suppose that the Collatz conjecture is true for all
. Let
be a counterexample. Then
Proof: The first bound is immediate from the positivity of . To prove the second bound, observe from the proof of Proposition 4 that the counterexample
will generate a counterexample to Conjecture 2, i.e. a non-trivial periodic orbit
. As the conjecture is true for all
, all terms in this orbit must be at least
. An inspection of the proof of Proposition 4 reveals that this orbit consists of
steps of the form
, and
steps of the form
. As all terms are at least
, the former steps can increase magnitude by a multiplicative factor of at most
. As the orbit returns to where it started, we conclude that
whence the claim.
The Collatz conjecture has already been verified for many values of (up to at least
, according to this web site). Inserting this into the above lemma, one can get lower bounds on
. For instance, by methods such as this, it is known that any non-trivial periodic orbit has length at least
, as shown in Garner’s paper (and this bound, which uses the much smaller value
that was available in 1981, can surely be improved using the most recent computational bounds).
Now we can perform a heuristic count on the number of counterexamples. If we fix and
, then
, and from basic combinatorics we see that there are
different ways to choose the remaining integers
to form a potential counterexample . As a crude heuristic, one expects that for a “random” such choice of integers, the expression (1) has a probability
of holding for some integer
. (Note that
is not divisible by
or
, and so one does not expect the special structure of the right-hand side of (1) with respect to those moduli to be relevant. There will be some choices of
where the right-hand side in (1) is too small to be divisible by
, but using the estimates in Lemma 5, one expects this to occur very infrequently.) Thus, the total expected number of solutions for this choice of
is
The heuristic number of solutions overall is then expected to be
, with the approximation here accurate to many decimal places.
We need a lower bound on . Here, we will use Baker’s theorem (as discussed in this previous post), which among other things gives the lower bound
. Meanwhile, Stirling’s formula (as discussed in this previous post) combined with the approximation
gives
where is the entropy function
A brief computation shows that
and so (ignoring all subexponential terms)
which makes the series (6) convergent. (Actually, one does not need the full strength of Lemma 5 here; anything that kept well away from
would suffice. In particular, one does not need an enormous value of
; even
(say) would be more than sufficient to obtain the heuristic that there are finitely many counterexamples.) Heuristically applying the Borel-Cantelli lemma, we thus expect that there are only a finite number of counterexamples to the weak Collatz conjecture (and inserting a bound such as
, one in fact expects it to be extremely likely that there are no counterexamples at all).
This, of course, is far short of any rigorous proof of Conjecture 2. In order to make rigorous progress on this conjecture, it seems that one would need to somehow exploit the structural properties of numbers of the form
with at most one exception (this is essentially what is called a
-cycle by Steiner). Then (8) simplifies via the geometric series formula to a combination of just a bounded number of powers of
and
, rather than an unbounded number. In that case, one can start using tools from transcendence theory such as Baker’s theorem to obtain good results; for instance, in the above-referenced paper of Steiner, it was shown that
-cycles cannot actually occur, and similar methods have been used to show that
-cycles (in which there are at most
exceptions to
) do not occur for any
, as was shown by Simons and de Weger. However, for general increasing tuples of integers
, there is no such representation by bounded numbers of powers, and it does not seem that methods from transcendence theory will be sufficient to control the expressions (8) to the extent that one can understand their divisibility properties by quantities such as
.
Amusingly, there is a slight connection to Littlewood-Offord theory in additive combinatorics – the study of the random sums
generated by some elements of an additive group
, or equivalently, the vertices of an
-dimensional parallelepiped inside
. Here, the relevant group is
. The point is that if one fixes
and
(and hence
), and lets
vary inside the simplex
then the set of all sums of the form (8) (viewed as an element of
) contains many large parallelepipeds. (Note, incidentally, that once one fixes
, all the sums of the form (8) are distinct; because given (8) and
, one can read off
as the largest power of
that divides (8), and then subtracting off
one can then read off
, and so forth.) This is because the simplex
contains many large cubes. Indeed, if one picks a typical element
of
, then one expects (thanks to Lemma 5) that there there will be
indices
such that
for
, which allows one to adjust each of the
independently by
if desired and still remain inside
. This gives a cube in
of dimension
, which then induces a parallelepiped of the same dimension in
. A short computation shows that the generators of this parallelepiped consist of products of a power of
and a power of
, and in particular will be coprime to
.
If the weak Collatz conjecture is true, then the set must avoid the residue class
in
. Let us suppose temporarily that we did not know about Baker’s theorem (and the associated bound (7)), so that
could potentially be quite small. Then we would have a large parallelepiped inside a small cyclic group
that did not cover all of
, which would not be possible for
small enough. Indeed, an easy induction shows that a
-dimensional parallelepiped in
, with all generators coprime to
, has cardinality at least
. This argument already shows the lower bound
. In other words, we have
Proposition 6 Suppose the weak Collatz conjecture is true. Then for any natural numbers
with
, one has
.
This bound is very weak when compared against the unconditional bound (7). However, I know of no way to get a nontrivial separation property between powers of and powers of
other than via transcendence theory methods. Thus, this result strongly suggests that any proof of the Collatz conjecture must either use existing results in transcendence theory, or else must contribute a new method to give non-trivial results in transcendence theory. (This already rules out a lot of possible approaches to solve the Collatz conjecture.)
By using more sophisticated tools in additive combinatorics, one can improve the above proposition (though it is still well short of the transcendence theory bound (7)):
Proposition 7 Suppose the weak Collatz conjecture is true. Then for any natural numbers
with
, one has
for some absolute constant
.
Proof: (Informal sketch only) Suppose not, then we can find with
of size
. We form the set
as before, which contains parallelepipeds in
of large dimension
that avoid
. We can count the number of times
occurs in one of these parallelepipeds by a standard Fourier-analytic computation involving Riesz products (see Chapter 7 of my book with Van Vu, or this recent preprint of Maples). Using this Fourier representation, the fact that this parallelepiped avoids
(and the fact that
) forces the generators
to be concentrated in a Bohr set, in that one can find a non-zero frequency
such that
of the
generators lie in the set
. However, one can choose the generators to essentially have the structure of a (generalised) geometric progression (up to scaling, it resembles something like
for
ranging over a generalised arithmetic progression, and
a fixed irrational), and one can show that such progressions cannot be concentrated in Bohr sets (this is similar in spirit to the exponential sum estimates of Bourgain on approximate multiplicative subgroups of
, though one can use more elementary methods here due to the very strong nature of the Bohr set concentration (being of the “
concentration” variety rather than the “
concentration”).). This furnishes the required contradiction.
Thus we see that any proposed proof of the Collatz conjecture must either use transcendence theory, or introduce new techniques that are powerful enough to create exponential separation between powers of and powers of
.
Unfortunately, once one uses the transcendence theory bound (7), the size of the cyclic group
becomes larger than the volume of any cube in
, and Littlewood-Offord techniques are no longer of much use (they can be used to show that
is highly equidistributed in
, but this does not directly give any way to prevent
from containing
).
One possible toy model problem for the (weak) Collatz conjecture is a conjecture of Erdos asserting that for , the base
representation of
contains at least one
. (See this paper of Lagarias for some work on this conjecture and on related problems.) To put it another way, the conjecture asserts that there are no integer solutions to
with and
. (When
, of course, one has
.) In this form we see a resemblance to Conjecture 3, but it looks like a simpler problem to attack (though one which is still a fair distance beyond what one can do with current technology). Note that one has a similar heuristic support for this conjecture as one does for Proposition 3; a number of magnitude
has about
base
digits, so the heuristic probability that none of these digits are equal to
is
, which is absolutely summable.
I’ve been focusing my blog posts recently on the mathematics around Hilbert’s fifth problem (is every locally Euclidean group a Lie group?), but today, I will be discussing another of Hilbert’s problems, namely Hilbert’s seventh problem, on the transcendence of powers of two algebraic numbers
. (I am not randomly going through Hilbert’s list, by the way; I hope to explain my interest in the seventh problem in a later post.) This problem was famously solved by Gelfond and Schneider in the 1930s:
Theorem 1 (Gelfond-Schneider theorem) Let
be algebraic numbers, with
and
irrational. Then (any of the values of the possibly multi-valued expression)
is transcendental.
For sake of simplifying the discussion, let us focus on just one specific consequence of this theorem:
Corollary 2
is transcendental.
Proof: If not, one could obtain a contradiction to the Gelfond-Schneider theorem by setting and
. (Note that
is clearly irrational, since
for any integers
with
positive.)
In the 1960s, Alan Baker established a major generalisation of the Gelfond-Schneider theorem known as Baker’s theorem, as part of his work in transcendence theory that later earned him a Fields Medal. Among other things, this theorem provided explicit quantitative bounds on exactly how transcendental quantities such as were. In particular, it gave a strong bound on how irrational such quantities were (i.e. how easily they were approximable by rationals). Here, in particular, is one special case of Baker’s theorem:
Proposition 3 (Special case of Baker’s theorem) For any integers
with
positive, one has
for some absolute (and effectively computable) constants
.
This theorem may be compared with (the easily proved) Liouville’s theorem on diophantine approximation, which asserts that if is an irrational algebraic number of degree
, then
for all with
positive, and some effectively computable
, and (the more significantly difficult) Thue-Siegel-Roth theorem, which under the same hypotheses gives the bound
for all , all
with
positive and an ineffective constant
. Finally, one should compare these results against Dirichlet’s theorem on Diophantine approximation, which asserts that for any real number
one has
for infinitely many with
positive. (The reason the Thue-Siegel-Roth theorem is ineffective is because it relies heavily on the dueling conspiracies argument, i.e. playing off multiple “conspiracies”
against each other; the other results however only focus on one approximation at a time and thus avoid ineffectivity.)
Proposition 3 easily implies the following separation property between powers of and powers of
:
Corollary 4 (Separation between powers of
and powers of
) For any positive integers
one has
for some effectively computable constants
(which may be slightly different from those in Proposition 3).
Indeed, this follows quickly from Proposition 3, the identity
In particular, the gap between powers of three and powers of two
grows exponentially in the exponents
. I do not know of any other way to establish this fact other than essentially going through some version of Baker’s argument (which will be given below).
For comparison, by exploiting the trivial (yet fundamental) integrality gap – the obvious fact that if an integer is non-zero, then its magnitude is at least
– we have the trivial bound
for all positive integers (since, from the fundamental theorem of arithmetic,
cannot vanish). Putting this into (1) we obtain a very weak version of Proposition 3, that only gives exponential bounds instead of polynomial ones:
Proposition 5 (Trivial bound) For any integers
with
positive, one has
for some absolute (and effectively computable) constant
.
The proof of Baker’s theorem (or even of the simpler special case in Proposition 3) is largely elementary (except for some very basic complex analysis), but is quite intricate and lengthy, as a lot of careful book-keeping is necessary in order to get a bound as strong as that in Proposition 3. To illustrate the main ideas, I will prove a bound that is weaker than Proposition 3, but still significantly stronger than Proposition 5, and whose proof already captures many of the key ideas of Baker:
Proposition 6 (Weak special case of Baker’s theorem) For any integers
with
, one has
for some absolute constants
.
Note that Proposition 3 is equivalent to the assertion that one can take (and
effective) in the above proposition.
The proof of Proposition 6 can be made effective (for instance, it is not too difficult to make the close to
); however, in order to simplify the exposition (and in particular, to use some nonstandard analysis terminology to reduce the epsilon management), I will establish Proposition 6 with ineffective constants
.
Like many other results in transcendence theory, the proof of Baker’s theorem (and Proposition 6) rely on what we would nowadays call the polynomial method – to play off upper and lower bounds on the complexity of polynomials that vanish (or nearly vanish) to high order on a specified set of points. (I have discussed the polynomial method in relation to problems in incidence geometry in several previous blog posts.) In the specific case of Proposition 6, the points in question are of the form
for some large integer . On the one hand, the irrationality of
ensures that the curve
is not algebraic, and so it is difficult for a polynomial of controlled complexity to vanish (or nearly vanish) to high order at all the points of
; the trivial bound in Proposition 5 allows one to make this statement more precise. (Here, “complexity” of a polynomial is an informal term referring both to the degree of the polynomial, and the height of the coefficients, which in our application will essentially be integers up to some normalisation factors.) On the other hand, if Proposition 6 failed, then
is close to a rational, which by Taylor expansion makes
close to an algebraic curve over the rationals (up to some rescaling by factors such as
and
) at each point of
. This, together with a pigeonholing argument, allows one to find a polynomial
of reasonably controlled complexity to (nearly) vanish to high order at every point of
.
These observations, by themselves, are not sufficient to get beyond the trivial bound in Proposition 5. However, Baker’s key insight was to exploit the integrality gap to bootstrap the (near) vanishing of on a set
to imply near-vanishing of
on a larger set
with
. The point is that if a polynomial
of controlled degree and size (nearly) vanishes to higher order on a lot of points on an analytic curve such as
, then it will also be fairly small on many other points in
as well. (To quantify this statement efficiently, it is convenient to use the tools of complex analysis, which are particularly well suited to understand zeroes (or small values) of polynomials.) But then, thanks to the integrality gap (and the controlled complexity of
), we can amplify “fairly small” to “very small”.
Using this observation and an iteration argument, Baker was able to take a polynomial of controlled complexity that nearly vanished to high order on a relatively small set
, and bootstrap that to show near-vanishing on a much larger set
. This bootstrap allows one to dramatically bridge the gap between the upper and lower bounds on the complexity of polynomials that nearly vanish to a specified order on a given
, and eventually leads to Proposition 6 (and, with much more care and effort, to Proposition 3).
Below the fold, I give the details of this argument. My treatment here is inspired by this expose of Serre, and these lecture notes of Soundararajan (as transcribed by Ian Petrow).
One of the most well known problems from ancient Greek mathematics was that of trisecting an angle by straightedge and compass, which was eventually proven impossible in 1837 by Pierre Wantzel, using methods from Galois theory.
Formally, one can set up the problem as follows. Define a configuration to be a finite collection of points, lines, and circles in the Euclidean plane. Define a construction step to be one of the following operations to enlarge the collection
:
- (Straightedge) Given two distinct points
in
, form the line
that connects
and
, and add it to
.
- (Compass) Given two distinct points
in
, and given a third point
in
(which may or may not equal
or
), form the circle with centre
and radius equal to the length
of the line segment joining
and
, and add it to
.
- (Intersection) Given two distinct curves
in
(thus
is either a line or a circle in
, and similarly for
), select a point
that is common to both
and
(there are at most two such points), and add it to
.
We say that a point, line, or circle is constructible by straightedge and compass from a configuration if it can be obtained from
after applying a finite number of construction steps.
Problem 1 (Angle trisection) Let
be distinct points in the plane. Is it always possible to construct by straightedge and compass from
a line
through
that trisects the angle
, in the sense that the angle between
and
is one third of the angle of
?
Thanks to Wantzel’s result, the answer to this problem is known to be “no” in general; a generic angle cannot be trisected by straightedge and compass. (On the other hand, some special angles can certainly be trisected by straightedge and compass, such as a right angle. Also, one can certainly trisect generic angles using other methods than straightedge and compass; see the Wikipedia page on angle trisection for some examples of this.)
The impossibility of angle trisection stands in sharp contrast to the easy construction of angle bisection via straightedge and compass, which we briefly review as follows:
- Start with three points
.
- Form the circle
with centre
and radius
, and intersect it with the line
. Let
be the point in this intersection that lies on the same side of
as
. (
may well be equal to
).
- Form the circle
with centre
and radius
, and the circle
with centre
and radius
. Let
be the point of intersection of
and
that is not
.
- The line
will then bisect the angle
.
The key difference between angle trisection and angle bisection ultimately boils down to the following trivial number-theoretic fact:
Proof: Obvious by modular arithmetic, by induction, or by the fundamental theorem of arithmetic.
In contrast, there are of course plenty of powers of that are evenly divisible by
, and this is ultimately why angle bisection is easy while angle trisection is hard.
The standard way in which Lemma 2 is used to demonstrate the impossibility of angle trisection is via Galois theory. The implication is quite short if one knows this theory, but quite opaque otherwise. We briefly sketch the proof of this implication here, though we will not need it in the rest of the discussion. Firstly, Lemma 2 implies the following fact about field extensions.
Corollary 3 Let
be a field, and let
be an extension of
that can be constructed out of
by a finite sequence of quadratic extensions. Then
does not contain any cubic extensions
of
.
Proof: If contained a cubic extension
of
, then the dimension of
over
would be a multiple of three. On the other hand, if
is obtained from
by a tower of quadratic extensions, then the dimension of
over
is a power of two. The claim then follows from Lemma 2.
To conclude the proof, one then notes that any point, line, or circle that can be constructed from a configuration is definable in a field obtained from the coefficients of all the objects in
after taking a finite number of quadratic extensions, whereas a trisection of an angle
will generically only be definable in a cubic extension of the field generated by the coordinates of
.
The Galois theory method also allows one to obtain many other impossibility results of this type, most famously the Abel-Ruffini theorem on the insolvability of the quintic equation by radicals. For this reason (and also because of the many applications of Galois theory to number theory and other branches of mathematics), the Galois theory argument is the “right” way to prove the impossibility of angle trisection within the broader framework of modern mathematics. However, this argument has the drawback that it requires one to first understand Galois theory (or at least field theory), which is usually not presented until an advanced undergraduate algebra or number theory course, whilst the angle trisection problem requires only high-school level mathematics to formulate. Even if one is allowed to “cheat” and sweep several technicalities under the rug, one still needs to possess a fair amount of solid intuition about advanced algebra in order to appreciate the proof. (This was undoubtedly be one reason why, even after Wantzel’s impossibility result was published, a large amount of effort was still expended by amateur mathematicians to try to trisect a general angle.)
In this post I would therefore like to present a different proof (or perhaps more accurately, a disguised version of the standard proof) of the impossibility of angle trisection by straightedge and compass, that avoids explicit mention of Galois theory (though it is never far beneath the surface). With “cheats”, the proof is actually quite simple and geometric (except for Lemma 2, which is still used at a crucial juncture), based on the basic geometric concept of monodromy; unfortunately, some technical work is needed however to remove these cheats.
To describe the intuitive idea of the proof, let us return to the angle bisection construction, that takes a triple of points as input and returns a bisecting line
as output. We iterate the construction to create a quadrisecting line
, via the following sequence of steps that extend the original bisection construction:
- Start with three points
.
- Form the circle
with centre
and radius
, and intersect it with the line
. Let
be the point in this intersection that lies on the same side of
as
. (
may well be equal to
).
- Form the circle
with centre
and radius
, and the circle
with centre
and radius
. Let
be the point of intersection of
and
that is not
.
- Let
be the point on the line
which lies on
, and is on the same side of
as
.
- Form the circle
with centre
and radius
. Let
be the point of intersection of
and
that is not
.
- The line
will then quadrisect the angle
.
Let us fix the points and
, but not
, and view
(as well as intermediate objects such as
,
,
,
,
,
,
) as a function of
.
Let us now do the following: we begin rotating counterclockwise around
, which drags around the other objects
,
,
,
,
,
,
that were constructed by
accordingly. For instance, here is an early stage of this rotation process, when the angle
has become obtuse:
Now for the slightly tricky bit. We are going to keep rotating beyond a half-rotation of
, so that
now becomes a reflex angle. At this point, a singularity occurs; the point
collides into
, and so there is an instant in which the line
is not well-defined. However, this turns out to be a removable singularity (and the easiest way to demonstrate this will be to tap the power of complex analysis, as complex numbers can easily route around such a singularity), and we can blast through it to the other side, giving a picture like this:
Note that we have now deviated from the original construction in that and
are no longer on the same side of
; we are thus now working in a continuation of that construction rather than with the construction itself. Nevertheless, we can still work with this continuation (much as, say, one works with analytic continuations of infinite series such as
beyond their original domain of definition).
We now keep rotating around
. Here,
is approaching a full rotation of
:
When reaches a full rotation, a different singularity occurs:
and
coincide. Nevertheless, this is also a removable singularity, and we blast through to beyond a full rotation:
And now is back where it started, as are
,
,
, and
… but the point
has moved, from one intersection point of
to the other. As a consequence,
,
, and
have also changed, with
being at right angles to where it was before. (In the jargon of modern mathematics, the quadrisection construction has a non-trivial monodromy.)
But nothing stops us from rotating some more. If we continue this procedure, we see that after two full rotations of
around
, all points, lines, and circles constructed from
have returned to their original positions. Because of this, we shall say that the quadrisection construction described above is periodic with period
.
Similarly, if one performs an octisection of the angle by bisecting the quadrisection, one can verify that this octisection is periodic with period
; it takes four full rotations of
around
before the configuration returns to where it started. More generally, one can show
Proposition 4 Any construction of straightedge and compass from the points
is periodic with period equal to a power of
.
The reason for this, ultimately, is because any two circles or lines will intersect each other in at most two points, and so at each step of a straightedge-and-compass construction there is an ambiguity of at most . Each rotation of
around
can potentially flip one of these points to the other, but then if one rotates again, the point returns to its original position, and then one can analyse the next point in the construction in the same fashion until one obtains the proposition.
But now consider a putative trisection operation, that starts with an arbitrary angle and somehow uses some sequence of straightedge and compass constructions to end up with a trisecting line
:
What is the period of this construction? If we continuously rotate around
, we observe that a full rotations of
only causes the trisecting line
to rotate by a third of a full rotation (i.e. by
):
Because of this, we see that the period of any construction that contains must be a multiple of
. But this contradicts Proposition 4 and Lemma 2.
Below the fold, I will make the above proof rigorous. Unfortunately, in doing so, I had to again leave the world of high-school mathematics, as one needs a little bit of algebraic geometry and complex analysis to resolve the issues with singularities that we saw in the above sketch. Still, I feel that at an intuitive level at least, this argument is more geometric and accessible than the Galois-theoretic argument (though anyone familiar with Galois theory will note that there is really not that much difference between the proofs, ultimately, as one has simply replaced the Galois group with a closely related monodromy group instead).
Christian Elsholtz and I have recently finished our joint paper “Counting the number of solutions to the Erdös-Straus equation on unit fractions“, submitted to the Journal of the Australian Mathematical Society. This supercedes my previous paper on the subject, by obtaining stronger and more general results. (The paper is currently in the process of being resubmitted to the arXiv, and should appear at this link within a few days.)
As with the previous paper, the main object of study is the number of solutions to the Diophantine equation
positive integers. The Erdös-Straus conjecture asserts that
for all
. Since
for all positive integers
, it suffices to show that
for all primes
.
We single out two special types of solutions: Type I solutions, in which is divisible by
and
are coprime to
, and Type II solutions, in which
is coprime to
and
are divisible by
. Let
denote the number of Type I and Type II solutions respectively. For any
, one has
with equality when is an odd primes
. Thus, to prove the Erdös-Strauss conjecture, it suffices to show that at least one of
,
is positive whenever
is an odd prime.
Our first main results are the asymptotics
This improves upon the results in the previous paper, which only established
and
The double logarithmic factor in the upper bound for is artificial (arising from the inefficiency in the Brun-Titchmarsh inequality on very short progressions) but we do not know how to remove it.
The methods are similar to those in the previous paper (which were also independently discovered in unpublished work of Elsholtz and Heath-Brown), but with the additional input of the Erdös divisor bound on expressions of the form for polynomials
, discussed in this recent blog post. (Actually, we need to tighten Erdös’ bound somewhat, to obtain some uniformity in the bounds even as the coefficients of
become large, but this turns out to be achievable by going through the original arguments of Erdös more carefully.)
We also note an observation of Heath-Brown, that in our notation gives the lower bound
thus, we see that for typical , that most solutions to the Erdös-Straus equation are not of Type I or Type II, in contrast to the case when
is prime.
We also have a number other new results. We find a way to systematically unify all the previously known parameterisations of solutions to the Erdös-Straus equation, by lifting the Cayley-type surface to a certain three-dimensional variety in six-dimensional affine space, in such a way that integer points in the former arise from integer points in the latter. Each of the previously known characterisations of solutions then corresponds to a different choice of coordinates on this variety. (This point of view was also adopted in a paper of Heath-Brown, who interprets this lifted variety as the universal torsor of the Cayley surface.) By optimising between these parameterisations and exploiting the divisor bound, we obtain some bounds on the worst-case behaviour of
and
, namely
and
which should be compared to a recent previous bound of Browning and Elsholtz. In the other direction, we show that
for infinitely many
, and
for almost all primes
. Here, the main tools are some bounds for the representation of a rational as a sum of two unit fractions in the above-mentioned work of Browning and Elsholtz, and also the Turán-Kubilius inequality.
We also completely classify all the congruence classes that can be solved by polynomials, completing the partial list discussed in the previous post. Specifically, the Erdös-Straus conjecture is true for whenever one of the following congruence-type conditions is satisfied:
-
, where
are such that
.
-
and
, where
are such that
.
-
and
, where
are such that
.
-
or
, where
are such that
and
.
-
, where
are such that
.
-
, where
are such that
.
In principle, this suggests a way to extend the existing verification of the Erdös-Straus conjecture beyond the current range of by collecting all congruences to small moduli (e.g. up to
), and then using this to sieve out the primes up to a given size.
Finally, we begin a study of the more general equation
are fixed. We can obtain a partial analogue of our main bounds for the
case, namely that
and
were denotes the number of solutions to (2) which are of “Type II” in the sense that
are all divisible by
. However, we do not believe our bounds to be sharp in the large
regime, though it does show that the expected number of solutions to (2) should grow rapidly in
.
One of the basic problems in analytic number theory is to obtain bounds and asymptotics for sums of the form
in the limit , where
ranges over natural numbers less than
, and
is some arithmetic function of number-theoretic interest. (It is also often convenient to replace this sharply truncated sum with a smoother sum such as
, but we will not discuss this technicality here.) For instance, the prime number theorem is equivalent to the assertion
where is the von Mangoldt function, while the Riemann hypothesis is equivalent to the stronger assertion
It is thus of interest to develop techniques to estimate such sums . Of course, the difficulty of this task depends on how “nice” the function
is. The functions
that come up in number theory lie on a broad spectrum of “niceness”, with some particularly nice functions being quite easy to sum, and some being insanely difficult.
At the easiest end of the spectrum are those functions that exhibit some sort of regularity or “smoothness”. Examples of smoothness include “Archimedean” smoothness, in which
is the restriction of some smooth function
from the reals to the natural numbers, and the derivatives of
are well controlled. A typical example is
One can already get quite good bounds on this quantity by comparison with the integral , namely
with sharper bounds available by using tools such as the Euler-Maclaurin formula (see this blog post). Exponentiating such asymptotics, incidentally, leads to one of the standard proofs of Stirling’s formula (as discussed in this blog post).
One can also consider “non-Archimedean” notions of smoothness, such as periodicity relative to a small period . Indeed, if
is periodic with period
(and is thus essentially a function on the cyclic group
), then one has the easy bound
In particular, we have the fundamental estimate
is much smaller than
, but as
approaches
in magnitude, the error term
begins to overwhelm the main term
, and one needs much more delicate information on the fractional part of
in order to obtain good estimates at this point.
One can also consider functions which combine “Archimedean” and “non-Archimedean” smoothness into an “adelic” smoothness. We will not define this term precisely here (though the concept of a Schwartz-Bruhat function is one way to capture this sort of concept), but a typical example might be
where is periodic with some small period
. By using techniques such as summation by parts, one can estimate such sums using the techniques used to estimate sums of periodic functions or functions with (Archimedean) smoothness.
Another class of functions that is reasonably well controlled are the multiplicative functions, in which whenever
are coprime. Here, one can use the powerful techniques of multiplicative number theory, for instance by working with the Dirichlet series
which are clearly related to the partial sums (essentially via the Mellin transform, a cousin of the Fourier and Laplace transforms); for this post we ignore the (important) issue of how to make sense of this series when it is not absolutely convergent (but see this previous blog post for more discussion). A primary reason that this technique is effective is that the Dirichlet series of a multiplicative function factorises as an Euler product
One also obtains similar types of representations for functions that are not quite multiplicative, but are closely related to multiplicative functions, such as the von Mangoldt function (whose Dirichlet series
is not given by an Euler product, but instead by the logarithmic derivative of an Euler product).
Moving another notch along the spectrum between well-controlled and ill-controlled functions, one can consider functions that are divisor sums such as
for some other arithmetic function , and some level
. This is a linear combination of periodic functions
and is thus technically periodic in
(with period equal to the least common multiple of all the numbers from
to
), but in practice this periodic is far too large to be useful (except for extremely small levels
, e.g.
). Nevertheless, we can still control the sum
simply by rearranging the summation:
and thus by (1) one can bound this by the sum of a main term and an error term
. As long as the level
is significantly less than
, one may expect the main term to dominate, and one can often estimate this term by a variety of techniques (for instance, if
is multiplicative, then multiplicative number theory techniques are quite effective, as mentioned previously). Similarly for other slight variants of divisor sums, such as expressions of the form
or expressions of the form
where each is periodic with period
.
One of the simplest examples of this comes when estimating the divisor function
which counts the number of divisors up to . This is a multiplicative function, and is therefore most efficiently estimated using the techniques of multiplicative number theory; but for reasons that will become clearer later, let us “forget” the multiplicative structure and estimate the above sum by more elementary methods. By applying the preceding method, we see that
Here, we are (barely) able to keep the error term smaller than the main term; this is right at the edge of the divisor sum method, because the level in this case is equal to
. Unfortunately, at this high choice of level, it is not always possible to always keep the error term under control like this. For instance, if one wishes to use the standard divisor sum representation
where is the Möbius function, to compute
, then one ends up looking at
From Dirichlet series methods, it is not difficult to establish the identities
and
This suggests (but does not quite prove) that one has
, as discussed in this previous post), one is eventually left with the estimate
, which is useless as a lower bound (and recovers only the classical Chebyshev estimate
as the upper bound). The inefficiency here when compared to the situation with the divisor function
can be attributed to the signed nature of the Möbius function
, which causes some cancellation in the divisor sum expansion that needs to be compensated for with improved estimates.
However, there are a number of tricks available to reduce the level of divisor sums. The simplest comes from exploiting the change of variables , which can in principle reduce the level by a square root. For instance, when computing the divisor function
, one can observe using this change of variables that every divisor of
above
is paired with one below
, and so we have
is a perfect square, in which case one must subtract one from the right-hand side. Using this reduced-level divisor sum representation, one can obtain an improvement to (2), namely
This type of argument is also known as the Dirichlet hyperbola method. A variant of this argument can also deduce the prime number theorem from (3), (4) (and with some additional effort, one can even drop the use of (4)); this is discussed at this previous blog post.
Using this square root trick, one can now also control divisor sums such as
(Note that has no multiplicativity properties in
, and so multiplicative number theory techniques cannot be directly applied here.) The level of the divisor sum here is initially of order
, which is too large to be useful; but using the square root trick, we can expand this expression as
which one can rewrite as
The constraint is periodic in
with period
, so we can write this as
where is the number of solutions in
to the equation
, and so
The function is multiplicative, and can be easily computed at primes
and prime powers
using tools such as quadratic reciprocity and Hensel’s lemma. For instance, by Fermat’s two-square theorem,
is equal to
for
and
for
. From this and standard multiplicative number theory methods (e.g. by obtaining asymptotics on the Dirichlet series
), one eventually obtains the asymptotic
and also
and thus
Similar arguments give asymptotics for on other quadratic polynomials; see for instance this paper of Hooley and these papers by McKee. Note that the irreducibility of the polynomial will be important. If one considers instead a sum involving a reducible polynomial, such as
, then the analogous quantity
becomes significantly larger, leading to a larger growth rate (of order
rather than
) for the sum.
However, the square root trick is insufficient by itself to deal with higher order sums involving the divisor function, such as
the level here is initially of order , and the square root trick only lowers this to about
, creating an error term that overwhelms the main term. And indeed, the asymptotic for such this sum has not yet been rigorously established (although if one heuristically drops error terms, one can arrive at a reasonable conjecture for this asymptotic), although some results are known if one averages over additional parameters (see e.g. this paper of Greaves, or this paper of Matthiesen).
Nevertheless, there is an ingenious argument of Erdös that allows one to obtain good upper and lower bounds for these sorts of sums, in particular establishing the asymptotic
that maps
to
(with the implied constants depending of course on the choice of
). There is also the related moment bound
(not necessarily irreducible) and any fixed
, due to van der Corput; this bound is in fact used to dispose of some error terms in the proof of (6). These should be compared with what one can obtain from the divisor bound
and the trivial bound
, giving the bounds
for any fixed .
The lower bound in (6) is easy, since one can simply lower the level in (5) to obtain the lower bound
for any , and the preceding methods then easily allow one to obtain the lower bound by taking
small enough (more precisely, if
has degree
, one should take
equal to
or less). The upper bounds in (6) and (7) are more difficult. Ideally, if we could obtain upper bounds of the form
, then the preceding methods would easily establish both results. Unfortunately, this bound can fail, as illustrated by the following example. Suppose that
is the product of
distinct primes
, each of which is close to
. Then
has
divisors, with
of them close to
for each
. One can think of (the logarithms of) these divisors as being distributed according to what is essentially a Bernoulli distribution, thus a randomly selected divisor of
has magnitude about
, where
is a random variable which has the same distribution as the number of heads in
independently tossed fair coins. By the law of large numbers,
should concentrate near
when
is large, which implies that the majority of the divisors of
will be close to
. Sending
, one can show that the bound (8) fails whenever
.
This however can be fixed in a number of ways. First of all, even when , one can show weaker substitutes for (8). For instance, for any fixed
and
one can show a bound of the form
depending only on
. This nice elementary inequality (first observed by Landreau) already gives a quite short proof of van der Corput’s bound (7).
For Erdös’s upper bound (6), though, one cannot afford to lose these additional factors of , and one must argue more carefully. Here, the key observation is that the counterexample discussed earlier – when the natural number
is the product of a large number of fairly small primes – is quite atypical; most numbers have at least one large prime factor. For instance, the number of natural numbers less than
that contain a prime factor between
and
is equal to
which, thanks to Mertens’ theorem
for some absolute constant , is comparable to
. In a similar spirit, one can show by similarly elementary means that the number of natural numbers
less than
that are
-smooth, in the sense that all prime factors are at most
, is only about
or so. Because of this, one can hope that the bound (8), while not true in full generality, will still be true for most natural numbers
, with some slightly weaker substitute available (such as (7)) for the exceptional numbers
. This turns out to be the case by an elementary but careful argument.
The Erdös argument is quite robust; for instance, the more general inequality
for fixed irreducible and
, which improves van der Corput’s inequality (8) was shown by Delmer using the same methods. (A slight error in the original paper of Erdös was also corrected in this latter paper.) In a forthcoming revision to my paper on the Erdös-Straus conjecture, Christian Elsholtz and I have also applied this method to obtain bounds such as
which turn out to be enough to obtain the right asymptotics for the number of solutions to the equation .
Below the fold I will provide some more details of the arguments of Landreau and of Erdös.
I have just uploaded to the arXiv my paper “On the number of solutions to “, submitted to the Journal of the Australian Mathematical Society.
For any positive integer , let
denote the number of solutions to the Diophantine equation
where are positive integers (we allow repetitions, and do not require the
to be increasing). The Erdös-Straus conjecture asserts that
for all
. By dividing through by any positive integer
we see that
, so it suffices to verify this conjecture for primes
, i.e. to solve the Diophantine equation
. As the case
is easily solved, we may of course restrict attention to odd primes.
This conjecture remains open, although there is a reasonable amount of evidence towards its truth. For instance, it was shown by Vaughan that for any large , the number of exceptions
to the Erdös-Straus conjecture with
is at most
for some absolute constant
. The Erdös-Straus conjecture is also verified in several congruence classes of primes; for instance, from the identity
we see that the conjecture holds when . Further identities of this type can be used to resolve the conjecture unless
is a quadratic residue mod
, which leaves only six residue classes in that modulus to check (namely,
, and
). However, there is a significant obstruction to eliminating the quadratic residue classes, as we will discuss later.
By combining these reductions with extensive numerical calculations, the Erdös-Straus conjecture was verified for all by Swett.
One approach to solving Diophantine equations such as (1) is to use methods of analytic number theory, such as the circle method, to obtain asymptotics (or at least lower bounds) for the number of solutions (or some proxy for this number); if one obtains a lower bound which is nontrivial for every
, one has solved the problem. (One can alternatively view such methods as a variant of the probabilistic method; in this interpretation, one chooses the unknowns
according to some suitable probability distribution and then tries to show that the probability of solving the equation is positive.) Such techniques can be effective for instance for certain instances of the Waring problem. However, as a general rule, these methods only work when there are a lot of solutions, and specifically when the number
of solutions grows at a polynomial rate with the parameter
.
The first main result of my paper is that the number of solutions is essentially logarithmic in nature, thus providing a serious obstruction to solution by analytic methods, as there is almost no margin of error available. More precisely, I show that for almost all primes
(i.e. in a subset of primes of relative density
), one has
, or more precisely that
Readers familiar with analytic number theory will recognise the right-hand side from the divisor bound, which indeed plays a role in the proof.
Actually, there are more precise estimates which suggest (though do not quite prove) that the mean value of is comparable to
. To state these results properly, I need some more notation. It is not difficult to show that if
is an odd prime and
obey (1), then at least one, but not all, of the
must be a multiple of
. Thus we can split
, where
is the number of solutions to (1) where exactly
of the
are divisible by
. The sharpest estimates I can prove are then
Theorem 1 For all sufficiently large
, one has
and
Since the number of primes less than is comparable to
by the prime number theorem, this shows that the mean value of
in this range is between
and
, and the mean value of
is between
and
, which gives the previous result as a corollary (thanks to a Markov inequality argument). A naive Poisson process heuristic then suggests that each prime
has a “probability”
of having a solution to (1), which by the Borel-Cantelli lemma heuristically suggests that there are only finitely many
for which (1) fails. Of course, this is far from a rigorous proof (though the result of Vaughan mentioned earlier, which is based on the large sieve, can be viewed as a partial formalisation of the argument).
The first step in obtaining these results is an elementary description of in terms of some divisibility constraints. More precisely, we have
Lemma 2 Let
be an odd prime. Then
is equal to three times the number of triples
of positive integers, with
coprime,
dividing
, and
dividing
. Similarly,
is equal to three times the number of triples
of positive integers, with
coprime,
dividing
, and
dividing
.
One direction of this lemma is very easy: if divides
and
divides
then the identity
and its cyclic permutations give three contributions to , and similarly if
divides
instead then the identity
and its cyclic permutations give three contributions to . Conversely, some elementary number theory can be used to show that all solutions contributing to either
or
are of these forms. (Similar criteria have been used in prior literature, for instance in the previously mentioned paper of Vaughan.)
This lemma, incidentally, provides a way to quickly generate a large number of congruences of primes for which the Erdös-Straus conjecture can be verified. Indeed, from the above lemma we see that if
are coprime and
is any odd factor of
(and hence coprime to
), then the Erdös-Straus conjecture is true whenever
or
. For instance,
- Taking
, we see that the conjecture holds whenever
(leaving only those primes
);
- Taking
we see that the conjecture holds whenever
(leaving only those primes
);
- Taking
, we see that the conjecture holds whenever
((leaving only those primes
);
- Taking
, we see that the conjecture holds whenever
(leaving only those primes
);
- Taking
, we see that the conjecture holds whenever
; taking instead
, we see that the conjecture holds whenever
. (This leaves only the primes
that are equal to one of the six quadratic residues
, an observation first made by Mordell.)
- etc.
If we let (resp.
) be the number of triples
with
coprime,
dividing
, and
congruent to
(resp.
) mod
, the above lemma tells us that
for all odd primes
, so it suffices to show that
are non-zero for all
. One might hope that there are enough congruence relations provided by the previous observation to obtain a covering system of congruences which would resolve the conjecture. Unfortunately, as was also observed by Mordell, an application of the quadratic reciprocity law shows that these congruence relations cannot eliminate quadratic residues, only quadratic non-residues. More precisely, one can show that if
are as above (restricting to the case
, which contains all the unsolved cases of the conjecture) and
is the largest odd factor of
, then the Jacobi symbols
and
are always equal to
. One consequence of this is that
whenever
is an odd perfect square. This makes it quite hard to resolve the conjecture for odd primes
which resemble a perfect square in that they lie in quadratic residue classes to small moduli (and, from Dirichlet’s theorem on primes in arithmetic progressions, one can find many primes of this type). The same argument also shows that for an odd prime
, there are no solutions to
in which one or two of the are divisible by
, with the other denominators being coprime to
. (Of course, one can still get representations of
by starting with a representation of
and dividing by
, but such representations are is not of the above form.) This shows that any attempt to establish the Erdös-Straus conjecture by manually constructing
as a function of
must involve a technique which breaks down if
is replaced by
(for instance, this rules out any approach based on using polynomial combinations of
and dividing into cases based on residue classes of
in small moduli). Part of the problem here is that we do not have good bounds that prevent a prime
from “spoofing” a perfect square to all small moduli (say, to all moduli less than a small power of
); this problem is closely related (via quadratic reciprocity) to Vinogradov’s conjecture on the least quadratic nonresidue, discussed in this previous blog post.
It remains to control the sums for
. The lower bounds
are relatively routine to establish, arising from counting the contribution of those
that are somewhat small (e.g. less than
), and use only standard asymptotics of arithmetic functions and the Bombieri-Vinogradov inequality (to handle the restriction of the summation to primes). To obtain (nearly) matching upper bounds, one needs to prevent
from getting too large. In the case of
, the fact that
divides
and
divides
soon leads one to the bounds
, at which point one can count primes in residue classes modulo
with reasonable efficiency via the Brun-Titchmarsh inequality. This inequality unfortunately introduces a factor of
, which we were only able to handle by bounding it crudely by
, leading to the double logarithmic loss in the
sum.
For , these arguments do not give good bounds, because it is possible for
to be much larger than
, and one can no longer easily count solutions to
. However, an elementary change of variables resolves this problem. It turns out that if one lets
be the positive integers
then one can convert the triple to a triple
obeying the constraints
Conversely, every such triple determines
by the formulae
in a residue class modulo
rather than
, and with
, this allows one to count the number of solutions reasonably efficiently. The main loss now comes from counting the number of divisors
of
; in particular, one becomes interested in estimating sums such as
where are less than
and
is the number of divisors of
. In principle, because
behaves like
on the average (as can be seen from the Dirichlet hyperbola method), we expect this sum to be of order
, but I was unable to show this, instead using the far cruder divisor bound
to obtain an upper bound of . Any improvement upon this bound would lead to a corresponding improvement in the upper bound for
. While improvement is possible in some ranges (particularly when
is large) using various bounds on Kloosterman-type exponential sums, I was not able to adequately control these sums when
was fairly small (e.g. polylogarithmic size in
), as one could no longer extract much of an averaging effect from the
summation in that case. Part of the difficulty is that in that case one must somehow exploit the fact that
is irreducible as a polynomial in
for any fixed
, otherwise there will be too many divisors.
I’ve uploaded to the arXiv the polymath research paper “Deterministic methods to find primes“, which is the outcome of the Polymath4 collaborative mathematics project, and has been submitted to Mathematics of Computation.
The objective of this paper was to find fast deterministic algorithms to solve the following problem:
Given a (large) integer x, find a prime p larger than x.
Thanks to the AKS algorithm, a number of size O(x) can be deterministically tested for primality in time . By Bertrand’s postulate, there is always at least one prime between x and 2x; by testing each one of these integers in turn for primality, one can thus obtain a deterministic algorithm to find primes in time
.
But one should be able to do much better. For comparison, if probabilistic algorithms are allowed, then by randomly selecting integers between x and 2x to test for primality, it is easy to see from the prime number theorem that one will succeed in obtaining a prime with high probability in time . However, after some effort we were not able to “derandomise” this algorithm to create any reasonable deterministic counterpart. Nevertheless, we conjecture that a deterministic algorithm with run time
exists. Such algorithms can be easily obtained if one assumes some standard conjectures regarding the primes (e.g. Cramer’s conjecture on prime gaps), but we do not know of any deterministic algorithms which can be unconditionally proved to run in time
.
Currently, the best known deterministic algorithm is due to Lagarias and Odlyzko, and has a run time of . Roughly speaking, it is based on the ability to compute the prime counting function
in time
; once one has this function, one can detect which intervals contain primes or not, and then starting from Bertrand’s postulate and performing a binary search one can then locate a prime. The Lagarias-Odlyzko argument is based on approximating
by a certain integral of the Riemann zeta function, which one then approximates in turn by quadrature.
We conjecture that one should be able to compute in faster time, and in particular in time
for some
. Unfortunately, we were not able to achieve this; however, we do have a non-trivial method to compute the parity
of
in such a time; a bit more generally (and oversimplifying a little bit), we can compute various projections
of the prime polynomial
modulo some small polynomials g. This seems close to being able to achieve the goal of detecting whether primes exist in a given interval, but unfortunately some additional work seems to be needed in that area. Still, the methods used to get the parity seem to be interesting (involving the Dirichlet hyperbola method, a piecewise approximation of the hyperbola, and the Strassen fast multiplication algorithm), and these techniques might be useful for some other problems.
Roughly speaking, the idea to compute the parity of is as follows. The first observation is that, for square-free n, the number
of divisors of n is equal to 2 when n is a prime, and a multiple of 4 otherwise. So to compute the parity of
, it suffices to compute
modulo 4 (or more precisely, the restriction of this sum to squarefree n).
The restriction to squarefree numbers turns out to be relatively easy to handle, so we ignore it. Since , we can rewrite
.
So it suffices to find an efficient way to count the number of lattice points below the hyperbola .
The classic way to do this is via the Dirichlet hyperbola method. This lets one rewrite the above expression as
(assuming is not a perfect square, where
is the greatest integer function). One can then compute this quantity in time
without difficulty.
To improve this to , we used some ideas of Vinogradov, based on using Diophantine approximation to find moderately long arithmetic progressions on which the map
is linear, and thus easily summable.
The same method extends to the polynomial setting, though now, instead of summing an arithmetic progression, one has to find an efficient way to sum quadratic sums such as . This turns out to be possible by expressing this sum as a matrix product and then using fast matrix multiplication algorithms.
There is still some scope to improve the results and methods further; Ernie Croot is pursuing this with two undergraduate students, David Hollis and David Lowry,
As is now widely reported, the Fields medals for 2010 have been awarded to Elon Lindenstrauss, Ngo Bao Chau, Stas Smirnov, and Cedric Villani. Concurrently, the Nevanlinna prize (for outstanding contributions to mathematical aspects of information science) was awarded to Dan Spielman, the Gauss prize (for outstanding mathematical contributions that have found significant applications outside of mathematics) to Yves Meyer, and the Chern medal (for lifelong achievement in mathematics) to Louis Nirenberg. All of the recipients are of course exceptionally qualified and deserving for these awards; congratulations to all of them. (I should mention that I myself was only very tangentially involved in the awards selection process, and like everyone else, had to wait until the ceremony to find out the winners. I imagine that the work of the prize committees must have been extremely difficult.)
Today, I thought I would mention one result of each of the Fields medalists; by chance, three of the four medalists work in areas reasonably close to my own. (Ngo is rather more distant from my areas of expertise, but I will give it a shot anyway.) This will of course only be a tiny sample of each of their work, and I do not claim to be necessarily describing their “best” achievement, as I only know a portion of the research of each of them, and my selection choice may be somewhat idiosyncratic. (I may discuss the work of Spielman, Meyer, and Nirenberg in a later post.)
In this, the final lecture notes of this course, we discuss one of the motivating applications of the theory developed thus far, namely to count solutions to linear equations in primes (or in dense subsets
of primes
). Unfortunately, the most famous linear equations in primes: the twin prime equation
and the even Goldbach equation
– remain out of reach of this technology (because the relevant affine linear forms involved are commensurate, and thus have infinite complexity with respect to the Gowers norms), but most other systems of equations, in particular that of arithmetic progressions
for
(or equivalently,
for
) , as well as the odd Goldbach equation
, are tractable.
To illustrate the main ideas, we will focus on the following result of Green:
Theorem 1 (Roth’s theorem in the primes) Let
be a subset of primes whose upper density
is positive. Then
contains infinitely many arithmetic progressions of length three.
This should be compared with Roth’s theorem in the integers (Notes 2), which is the same statement but with the primes replaced by the integers
(or natural numbers
). Indeed, Roth’s theorem for the primes is proven by transferring Roth’s theorem for the integers to the prime setting; the latter theorem is used as a “black box”. The key difficulty here in performing this transference is that the primes have zero density inside the integers; indeed, from the prime number theorem we have
.
There are a number of generalisations of this transference technique. In a paper of Green and myself, we extended the above theorem to progressions of longer length (thus transferring Szemerédi’s theorem to the primes). In a series of papers (culminating in a paper to appear shortly) of Green, myself, and also Ziegler, related methods are also used to obtain an asymptotic for the number of solutions in the primes to any system of linear equations of bounded complexity. This latter result uses the full power of higher order Fourier analysis, in particular relying heavily on the inverse conjecture for the Gowers norms; in contrast, Roth’s theorem and Szemerédi’s theorem in the primes are “softer” results that do not need this conjecture.
To transfer results from the integers to the primes, there are three basic steps:
- A general transference principle, that transfers certain types of additive combinatorial results from dense subsets of the integers to dense subsets of a suitably “pseudorandom set” of integers (or more precisely, to the integers weighted by a suitably “pseudorandom measure”);
- An application of sieve theory to show that the primes (or more precisely, an affine modification of the primes) lie inside a suitably pseudorandom set of integers (or more precisely, have significant mass with respect to a suitably pseudorandom measure).
- If one is seeking asymptotics for patterns in the primes, and not simply lower bounds, one also needs to control correlations between the primes (or proxies for the primes, such as the Möbius function) with various objects that arise from higher order Fourier analysis, such as nilsequences.
The former step can be accomplished in a number of ways. For progressions of length three (and more generally, for controlling linear patterns of complexity at most one), transference can be accomplished by Fourier-analytic methods. For more complicated patterns, one can use techniques inspired by ergodic theory; more recently, simplified and more efficient methods based on duality (the Hahn-Banach theorem) have also been used. No number theory is used in this step. (In the case of transference to genuinely random sets, rather than pseudorandom sets, similar ideas appeared earlier in the graph theory setting, see this paper of Kohayakawa, Luczak, and Rodl.
The second step is accomplished by fairly standard sieve theory methods (e.g. the Selberg sieve, or the slight variants of this sieve used by Goldston and Yildirim). Remarkably, very little of the formidable apparatus of modern analytic number theory is needed for this step; for instance, the only fact about the Riemann zeta function that is truly needed is that it has a simple pole at , and no knowledge of L-functions is needed.
The third step does draw more significantly on analytic number theory techniques and results (most notably, the method of Vinogradov to compute oscillatory sums over the primes, and also the Siegel-Walfisz theorem that gives a good error term on the prime number theorem in arithemtic progressions). As these techniques are somewhat orthogonal to the main topic of this course, we shall only touch briefly on this aspect of the transference strategy.











Recent Comments