You are currently browsing the category archive for the ‘expository’ category.

This post is a continuation of the previous post on sieve theory, which is an ongoing part of the Polymath8 project to improve the various parameters in Zhang’s proof that bounded gaps between primes occur infinitely often. Given that the comments on that page are getting quite lengthy, this is also a good opportunity to “roll over” that thread.

We will continue the notation from the previous post, including the concept of an admissible tuple, the use of an asymptotic parameter {x} going to infinity, and a quantity {w} depending on {x} that goes to infinity sufficiently slowly with {x}, and {W := \prod_{p<w} p} (the {W}-trick).

The objective of this portion of the Polymath8 project is to make as efficient as possible the connection between two types of results, which we call {DHL[k_0,2]} and {MPZ[\varpi,\delta]}. Let us first state {DHL[k_0,2]}, which has an integer parameter {k_0 \geq 2}:

Conjecture 1 ({DHL[k_0,2]}) Let {{\mathcal H}} be a fixed admissible {k_0}-tuple. Then there are infinitely many translates {n+{\mathcal H}} of {{\mathcal H}} which contain at least two primes.

Zhang was the first to prove a result of this type with {k_0 = 3,500,000}. Since then the value of {k_0} has been lowered substantially; at this time of writing, the current record is {k_0 = 26,024}.

There are two basic ways known currently to attain this conjecture. The first is to use the Elliott-Halberstam conjecture {EH[\theta]} for some {\theta>1/2}:

Conjecture 2 ({EH[\theta]}) One has

\displaystyle  \sum_{1 \leq q \leq x^\theta} \sup_{a \in ({\bf Z}/q{\bf Z})^\times} |\sum_{n < x: n = a\ (q)} \Lambda(n) - \frac{1}{\phi(q)} \sum_{n < x} \Lambda(n)|

\displaystyle = O( \frac{x}{\log^A x} )

for all fixed {A>0}. Here we use the abbreviation {n=a\ (q)} for {n=a \hbox{ mod } q}.

Here of course {\Lambda} is the von Mangoldt function and {\phi} the Euler totient function. It is conjectured that {EH[\theta]} holds for all {0 < \theta < 1}, but this is currently only known for {0 < \theta < 1/2}, an important result known as the Bombieri-Vinogradov theorem.

In a breakthrough paper, Goldston, Yildirim, and Pintz established an implication of the form

\displaystyle  EH[\theta] \implies DHL[k_0,2] \ \ \ \ \ (1)

for any {1/2 < \theta < 1}, where {k_0 = k_0(\theta)} depends on {\theta}. This deduction was very recently optimised by Farkas, Pintz, and Revesz and also independently in the comments to the previous blog post, leading to the following implication:

Theorem 3 (EH implies DHL) Let {1/2 < \theta < 1} be a real number, and let {k_0 \geq 2} be an integer obeying the inequality

\displaystyle  2\theta > \frac{j_{k_0-2}^2}{k_0(k_0-1)}, \ \ \ \ \ (2)

where {j_n} is the first positive zero of the Bessel function {J_n(x)}. Then {EH[\theta]} implies {DHL[k_0,2]}.

Note that the right-hand side of (2) is larger than {1}, but tends asymptotically to {1} as {k_0 \rightarrow \infty}. We give an alternate proof of Theorem 3 below the fold.

Implications of the form Theorem 3 were modified by Motohashi and Pintz, which in our notation replaces {EH[\theta]} by an easier conjecture {MPZ[\varpi,\delta]} for some {0 < \varpi < 1/4} and {0 < \delta < 1/4+\varpi}, at the cost of degrading the sufficient condition (2) slightly. In our notation, this conjecture takes the following form for each choice of parameters {\varpi,\delta}:

Conjecture 4 ({MPZ[\varpi,\delta]}) Let {{\mathcal H}} be a fixed {k_0}-tuple (not necessarily admissible) for some fixed {k_0 \geq 2}, and let {b\ (W)} be a primitive residue class. Then

\displaystyle  \sum_{q \in {\mathcal S}_I: q< x^{1/2+2\varpi}} \sum_{a \in C(q)} |\Delta_{b,W}(\Lambda; q,a)| = O( x \log^{-A} x) \ \ \ \ \ (3)

for any fixed {A>0}, where {I = (w,x^{\delta})}, {{\mathcal S}_I} are the square-free integers whose prime factors lie in {I}, and {\Delta_{b,W}(\Lambda;q,a)} is the quantity

\displaystyle  \Delta_{b,W}(\Lambda;q,a) := | \sum_{x \leq n \leq 2x: n=b\ (W); n = a\ (q)} \Lambda(n) \ \ \ \ \ (4)

\displaystyle  - \frac{1}{\phi(q)} \sum_{x \leq n \leq 2x: n = b\ (W)} \Lambda(n)|.

and {C(q)} is the set of congruence classes

\displaystyle  C(q) := \{ a \in ({\bf Z}/q{\bf Z})^\times: P(a) = 0 \}

and {P} is the polynomial

\displaystyle  P(a) := \prod_{h \in {\mathcal H}} (a+h).

This is a weakened version of the Elliott-Halberstam conjecture:

Proposition 5 (EH implies MPZ) Let {0 < \varpi < 1/4} and {0 < \delta < 1/4+\varpi}. Then {EH[1/2+2\varpi+\epsilon]} implies {MPZ[\varpi,\delta]} for any {\epsilon>0}. (In abbreviated form: {EH[1/2+2\varpi+]} implies {MPZ[\varpi,\delta]}.)

In particular, since {EH[\theta]} is conjecturally true for all {0 < \theta < 1/2}, we conjecture {MPZ[\varpi,\delta]} to be true for all {0 < \varpi < 1/4} and {0<\delta<1/4+\varpi}.

Proof: Define

\displaystyle  E(q) := \sup_{a \in ({\bf Z}/q{\bf Z})^\times} |\sum_{x \leq n \leq 2x: n = a\ (q)} \Lambda(n) - \frac{1}{\phi(q)} \sum_{x \leq n \leq 2x} \Lambda(n)|

then the hypothesis {EH[1/2+2\varpi+\epsilon]} (applied to {x} and {2x} and then subtracting) tells us that

\displaystyle  \sum_{1 \leq q \leq Wx^{1/2+2\varpi}} E(q) \ll x \log^{-A} x

for any fixed {A>0}. From the Chinese remainder theorem and the Siegel-Walfisz theorem we have

\displaystyle  \sup_{a \in ({\bf Z}/q{\bf Z})^\times} \Delta_{b,W}(\Lambda;q,a) \ll E(qW) + \frac{1}{\phi(q)} x \log^{-A} x

for any {q} coprime to {W} (and in particular for {q \in {\mathcal S}_I}). Since {|C(q)| \leq k_0^{\Omega(q)}}, where {\Omega(q)} is the number of prime divisors of {q}, we can thus bound the left-hand side of (3) by

\displaystyle  \ll \sum_{q \in {\mathcal S}_I: q< x^{1/2+2\varpi}} k_0^{\Omega(q)} E(qW) + k_0^{\Omega(q)} \frac{1}{\phi(q)} x \log^{-A} x.

The contribution of the second term is {O(x \log^{-A+O(1)} x)} by standard estimates (see Proposition 8 below). Using the very crude bound

\displaystyle  E(q) \ll \frac{1}{\phi(q)} x \log x

and standard estimates we also have

\displaystyle  \sum_{q \in {\mathcal S}_I: q< x^{1/2+2\varpi}} k_0^{2\Omega(q)} E(qW) \ll x \log^{O(1)} A

and the claim now follows from the Cauchy-Schwarz inequality. \Box

In practice, the conjecture {MPZ[\varpi,\delta]} is easier to prove than {EH[1/2+2\varpi+]} due to the restriction of the residue classes {a} to {C(q)}, and also the restriction of the modulus {q} to {x^\delta}-smooth numbers. Zhang proved {MPZ[\varpi,\varpi]} for any {0 < \varpi < 1/1168}. More recently, our Polymath8 group has analysed Zhang’s argument (using in part a corrected version of the analysis of a recent preprint of Pintz) to obtain {MPZ[\varpi,\delta]} whenever {\delta, \varpi > 0} are such that

\displaystyle  207\varpi + 43\delta < \frac{1}{4}.

The work of Motohashi and Pintz, and later Zhang, implicitly describe arguments that allow one to deduce {DHL[k_0,2]} from {MPZ[\varpi,\delta]} provided that {k_0} is sufficiently large depending on {\varpi,\delta}. The best implication of this sort that we have been able to verify thus far is the following result, established in the previous post:

Theorem 6 (MPZ implies DHL) Let {0 < \varpi < 1/4}, {0 < \delta < 1/4+\varpi}, and let {k_0 \geq 2} be an integer obeying the constraint

\displaystyle  1+4\varpi > \frac{j_{k_0-2}^2}{k_0(k_0-1)} (1+\kappa) \ \ \ \ \ (5)

where {\kappa} is the quantity

\displaystyle \kappa := \sum_{1 \leq n < \frac{1+4\varpi}{2\delta}} (1 - \frac{2n \delta}{1 + 4\varpi})^{k_0/2} \prod_{j=1}^{n} (1 + 3k_0 \log(1+\frac{1}{j})) ).

Then {MPZ[\varpi,\delta]} implies {DHL[k_0,2]}.

This complicated version of {\kappa} is roughly of size {3 \log(2) k_0 \exp( - k_0 \delta)}. It is unlikely to be optimal; the work of Motohashi-Pintz and Pintz suggests that it can essentially be improved to {\frac{1}{\delta} \exp(-k_0 \delta)}, but currently we are unable to verify this claim. One of the aims of this post is to encourage further discussion as to how to improve the {\kappa} term in results such as Theorem 6.

We remark that as (5) is an open condition, it is unaffected by infinitesimal modifications to {\varpi,\delta}, and so we do not ascribe much importance to such modifications (e.g. replacing {\varpi} by {\varpi-\epsilon} for some arbitrarily small {\epsilon>0}).

The known deductions of {DHL[k_0,2]} from claims such as {EH[\theta]} or {MPZ[\varpi,\delta]} rely on the following elementary observation of Goldston, Pintz, and Yildirim (essentially a weighted pigeonhole principle), which we have placed in “{W}-tricked form”:

Lemma 7 (Criterion for DHL) Let {k_0 \geq 2}. Suppose that for each fixed admissible {k_0}-tuple {{\mathcal H}} and each congruence class {b\ (W)} such that {b+h} is coprime to {W} for all {h \in {\mathcal H}}, one can find a non-negative weight function {\nu: {\bf N} \rightarrow {\bf R}^+}, fixed quantities {\alpha,\beta > 0}, a quantity {A>0}, and a fixed positive power {R} of {x} such that one has the upper bound

\displaystyle  \sum_{x \leq n \leq 2x: n = b\ (W)} \nu(n) \leq (\alpha+o(1)) A\frac{x}{W}, \ \ \ \ \ (6)

the lower bound

\displaystyle  \sum_{x \leq n \leq 2x: n = b\ (W)} \nu(n) \theta(n+h_i) \geq (\beta-o(1)) A\frac{x}{W} \log R \ \ \ \ \ (7)

for all {h_i \in {\mathcal H}}, and the key inequality

\displaystyle  \frac{\log R}{\log x} > \frac{1}{k_0} \frac{\alpha}{\beta} \ \ \ \ \ (8)

holds. Then {DHL[k_0,2]} holds. Here {\theta(n)} is defined to equal {\log n} when {n} is prime and {0} otherwise.

Proof: Consider the quantity

\displaystyle  \sum_{x \leq n \leq 2x: n = b\ (W)} \nu(n) (\sum_{h \in {\mathcal H}} \theta(n+h) - \log(3x)). \ \ \ \ \ (9)

By (6), (7), this quantity is at least

\displaystyle  k_0 \beta A\frac{x}{W} \log R - \alpha \log(3x) A\frac{x}{W} - o(A\frac{x}{W} \log x).

By (8), this expression is positive for all sufficiently large {x}. On the other hand, (9) can only be positive if at least one summand is positive, which only can happen when {n+{\mathcal H}} contains at least two primes for some {x \leq n \leq 2x} with {n=b\ (W)}. Letting {x \rightarrow \infty} we obtain {DHL[k_0,2]} as claimed. \Box

In practice, the quantity {R} (referred to as the sieve level) is a power of {x} such as {x^{\theta/2}} or {x^{1/4+\varpi}}, and reflects the strength of the distribution hypothesis {EH[\theta]} or {MPZ[\varpi,\delta]} that is available; the quantity {R} will also be a key parameter in the definition of the sieve weight {\nu}. The factor {A} reflects the order of magnitude of the expected density of {\nu} in the residue class {b\ (W)}; it could be absorbed into the sieve weight {\nu} by dividing that weight by {A}, but it is convenient to not enforce such a normalisation so as not to clutter up the formulae. In practice, {A} will some combination of {\frac{\phi(W)}{W}} and {\log R}.

Once one has decided to rely on Lemma 7, the next main task is to select a good weight {\nu} for which the ratio {\alpha/\beta} is as small as possible (and for which the sieve level {R} is as large as possible. To ensure non-negativity, we use the Selberg sieve

\displaystyle  \nu = \lambda^2, \ \ \ \ \ (10)

where {\lambda(n)} takes the form

\displaystyle  \lambda(n) = \sum_{d \in {\mathcal S}_I: d|P(n)} \mu(d) a_d

for some weights {a_d \in {\bf R}} vanishing for {d>R} that are to be chosen, where {I \subset (w,+\infty)} is an interval and {P} is the polynomial {P(n) := \prod_{h \in {\mathcal H}} (n+h)}. If the distribution hypothesis is {EH[\theta]}, one takes {R := x^{\theta/2}} and {I := (w,+\infty)}; if the distribution hypothesis is instead {MPZ[\varpi,\delta]}, one takes {R := x^{1/4+\varpi}} and {I := (w,x^\delta)}.

One has a useful amount of flexibility in selecting the weights {a_d} for the Selberg sieve. The original work of Goldston, Pintz, and Yildirim, as well as the subsequent paper of Zhang, the choice

\displaystyle  a_d := \log(\frac{R}{d})_+^{k_0+\ell_0}

is used for some additional parameter {\ell_0 > 0} to be optimised over. More generally, one can take

\displaystyle  a_d := g( \frac{\log d}{\log R} )

for some suitable (in particular, sufficiently smooth) cutoff function {g: {\bf R} \rightarrow {\bf R}}. We will refer to this choice of sieve weights as the “analytic Selberg sieve”; this is the choice used in the analysis in the previous post.

However, there is a slight variant choice of sieve weights that one can use, which I will call the “elementary Selberg sieve”, and it takes the form

\displaystyle  a_d := \frac{1}{\Phi(d) \Delta(d)} \sum_{q \in {\mathcal S}_I: (q,d)=1} \frac{1}{\Phi(q)} f'( \frac{\log dq}{\log R}) \ \ \ \ \ (11)

for a sufficiently smooth function {f: {\bf R} \rightarrow {\bf R}}, where

\displaystyle  \Phi(d) := \prod_{p|d} \frac{p-k_0}{k_0}

for {d \in {\mathcal S}_I} is a {k_0}-variant of the Euler totient function, and

\displaystyle  \Delta(d) := \prod_{p|d} \frac{k_0}{p} = \frac{k_0^{\Omega(d)}}{d}

for {d \in {\mathcal S}_I} is a {k_0}-variant of the function {1/d}. (The derivative on the {f} cutoff is convenient for computations, as will be made clearer later in this post.) This choice of weights {a_d} may seem somewhat arbitrary, but it arises naturally when considering how to optimise the quadratic form

\displaystyle  \sum_{d_1,d_2 \in {\mathcal S}_I} a_{d_1} a_{d_2} \Delta([d_1,d_2])

(which arises naturally in the estimation of {\alpha} in (6)) subject to a fixed value of {a_1} (which morally is associated to the estimation of {\beta} in (7)); this is discussed in any sieve theory text as part of the general theory of the Selberg sieve, e.g. Friedlander-Iwaniec.

The use of the elementary Selberg sieve for the bounded prime gaps problem was studied by Motohashi and Pintz. Their arguments give an alternate derivation of {DHL[k_0,2]} from {MPZ[\varpi,\theta]} for {k_0} sufficiently large, although unfortunately we were not able to confirm some of their calculations regarding the precise dependence of {k_0} on {\varpi,\theta}, and in particular we have not yet been able to improve upon the specific criterion in Theorem 6 using the elementary sieve. However it is quite plausible that such improvements could become available with additional arguments.

Below the fold we describe how the elementary Selberg sieve can be used to reprove Theorem 3, and discuss how they could potentially be used to improve upon Theorem 6. (But the elementary Selberg sieve and the analytic Selberg sieve are in any event closely related; see the appendix of this paper of mine with Ben Green for some further discussion.) For the purposes of polymath8, either developing the elementary Selberg sieve or continuing the analysis of the analytic Selberg sieve from the previous post would be a relevant topic of conversation in the comments to this post.

Read the rest of this entry »

Suppose one is given a {k_0}-tuple {{\mathcal H} = (h_1,\ldots,h_{k_0})} of {k_0} distinct integers for some {k_0 \geq 1}, arranged in increasing order. When is it possible to find infinitely many translates {n + {\mathcal H} =(n+h_1,\ldots,n+h_{k_0})} of {{\mathcal H}} which consists entirely of primes? The case {k_0=1} is just Euclid’s theorem on the infinitude of primes, but the case {k_0=2} is already open in general, with the {{\mathcal H} = (0,2)} case being the notorious twin prime conjecture.

On the other hand, there are some tuples {{\mathcal H}} for which one can easily answer the above question in the negative. For instance, the only translate of {(0,1)} that consists entirely of primes is {(2,3)}, basically because each translate of {(0,1)} must contain an even number, and the only even prime is {2}. More generally, if there is a prime {p} such that {{\mathcal H}} meets each of the {p} residue classes {0 \hbox{ mod } p, 1 \hbox{ mod } p, \ldots, p-1 \hbox{ mod } p}, then every translate of {{\mathcal H}} contains at least one multiple of {p}; since {p} is the only multiple of {p} that is prime, this shows that there are only finitely many translates of {{\mathcal H}} that consist entirely of primes.

To avoid this obstruction, let us call a {k_0}-tuple {{\mathcal H}} admissible if it avoids at least one residue class {\hbox{ mod } p} for each prime {p}. It is easy to check for admissibility in practice, since a {k_0}-tuple is automatically admissible in every prime {p} larger than {k_0}, so one only needs to check a finite number of primes in order to decide on the admissibility of a given tuple. For instance, {(0,2)} or {(0,2,6)} are admissible, but {(0,2,4)} is not (because it covers all the residue classes modulo {3}). We then have the famous Hardy-Littlewood prime tuples conjecture:

Conjecture 1 (Prime tuples conjecture, qualitative form) If {{\mathcal H}} is an admissible {k_0}-tuple, then there exists infinitely many translates of {{\mathcal H}} that consist entirely of primes.

This conjecture is extremely difficult (containing the twin prime conjecture, for instance, as a special case), and in fact there is no explicitly known example of an admissible {k_0}-tuple with {k_0 \geq 2} for which we can verify this conjecture (although, thanks to the recent work of Zhang, we know that {(0,d)} satisfies the conclusion of the prime tuples conjecture for some {0 < d < 70,000,000}, even if we can’t yet say what the precise value of {d} is).

Actually, Hardy and Littlewood conjectured a more precise version of Conjecture 1. Given an admissible {k_0}-tuple {{\mathcal H} = (h_1,\ldots,h_{k_0})}, and for each prime {p}, let {\nu_p = \nu_p({\mathcal H}) := |{\mathcal H} \hbox{ mod } p|} denote the number of residue classes modulo {p} that {{\mathcal H}} meets; thus we have {1 \leq \nu_p \leq p-1} for all {p} by admissibility, and also {\nu_p = k_0} for all {p>h_{k_0}-h_1}. We then define the singular series {{\mathfrak G} = {\mathfrak G}({\mathcal H})} associated to {{\mathcal H}} by the formula

\displaystyle  {\mathfrak G} := \prod_{p \in {\mathcal P}} \frac{1-\frac{\nu_p}{p}}{(1-\frac{1}{p})^{k_0}}

where {{\mathcal P} = \{2,3,5,\ldots\}} is the set of primes; by the previous discussion we see that the infinite product in {{\mathfrak G}} converges to a finite non-zero number.

We will also need some asymptotic notation (in the spirit of “cheap nonstandard analysis“). We will need a parameter {x} that one should think of going to infinity. Some mathematical objects (such as {{\mathcal H}} and {k}) will be independent of {x} and referred to as fixed; but unless otherwise specified we allow all mathematical objects under consideration to depend on {x}. If {X} and {Y} are two such quantities, we say that {X = O(Y)} if one has {|X| \leq CY} for some fixed {C}, and {X = o(Y)} if one has {|X| \leq c(x) Y} for some function {c(x)} of {x} (and of any fixed parameters present) that goes to zero as {x \rightarrow \infty} (for each choice of fixed parameters).

Conjecture 2 (Prime tuples conjecture, quantitative form) Let {k_0 \geq 1} be a fixed natural number, and let {{\mathcal H}} be a fixed admissible {k_0}-tuple. Then the number of natural numbers {n < x} such that {n+{\mathcal H}} consists entirely of primes is {({\mathfrak G} + o(1)) \frac{x}{\log^{k_0} x}}.

Thus, for instance, if Conjecture 2 holds, then the number of twin primes less than {x} should equal {(2 \Pi_2 + o(1)) \frac{x}{\log^2 x}}, where {\Pi_2} is the twin prime constant

\displaystyle  \Pi_2 := \prod_{p \in {\mathcal P}: p>2} (1 - \frac{1}{(p-1)^2}) = 0.6601618\ldots.

As this conjecture is stronger than Conjecture 1, it is of course open. However there are a number of partial results on this conjecture. For instance, this conjecture is known to be true if one introduces some additional averaging in {{\mathcal H}}; see for instance this previous post. From the methods of sieve theory, one can obtain an upper bound of {(C_{k_0} {\mathfrak G} + o(1)) \frac{x}{\log^{k_0} x}} for the number of {n < x} with {n + {\mathcal H}} all prime, where {C_{k_0}} depends only on {k_0}. Sieve theory can also give analogues of Conjecture 2 if the primes are replaced by a suitable notion of almost prime (or more precisely, by a weight function concentrated on almost primes).

Another type of partial result towards Conjectures 1, 2 come from the results of Goldston-Pintz-Yildirim, Motohashi-Pintz, and of Zhang. Following the notation of this recent paper of Pintz, for each {k_0>2}, let {DHL[k_0,2]} denote the following assertion (DHL stands for “Dickson-Hardy-Littlewood”):

Conjecture 3 ({DHL[k_0,2]}) Let {{\mathcal H}} be a fixed admissible {k_0}-tuple. Then there are infinitely many translates {n+{\mathcal H}} of {{\mathcal H}} which contain at least two primes.

This conjecture gets harder as {k_0} gets smaller. Note for instance that {DHL[2,2]} would imply all the {k_0=2} cases of Conjecture 1, including the twin prime conjecture. More generally, if one knew {DHL[k_0,2]} for some {k_0}, then one would immediately conclude that there are an infinite number of pairs of consecutive primes of separation at most {H(k_0)}, where {H(k_0)} is the minimal diameter {h_{k_0}-h_1} amongst all admissible {k_0}-tuples {{\mathcal H}}. Values of {H(k_0)} for small {k_0} can be found at this link (with {H(k_0)} denoted {w} in that page). For large {k_0}, the best upper bounds on {H(k_0)} have been found by using admissible {k_0}-tuples {{\mathcal H}} of the form

\displaystyle  {\mathcal H} = ( - p_{m+\lfloor k_0/2\rfloor - 1}, \ldots, - p_{m+1}, -1, +1, p_{m+1}, \ldots, p_{m+\lfloor (k_0+1)/2\rfloor - 1} )

where {p_n} denotes the {n^{th}} prime and {m} is a parameter to be optimised over (in practice it is an order of magnitude or two smaller than {k_0}); see this blog post for details. The upshot is that one can bound {H(k_0)} for large {k_0} by a quantity slightly smaller than {k_0 \log k_0} (and the large sieve inequality shows that this is sharp up to a factor of two, see e.g. this previous post for more discussion).

In a key breakthrough, Goldston, Pintz, and Yildirim were able to establish the following conditional result a few years ago:

Theorem 4 (Goldston-Pintz-Yildirim) Suppose that the Elliott-Halberstam conjecture {EH[\theta]} is true for some {1/2 < \theta < 1}. Then {DHL[k_0,2]} is true for some finite {k_0}. In particular, this establishes an infinite number of pairs of consecutive primes of separation {O(1)}.

The dependence of constants between {k_0} and {\theta} given by the Goldston-Pintz-Yildirim argument is basically of the form {k_0 \sim (\theta-1/2)^{-2}}. (UPDATE: as recently observed by Farkas, Pintz, and Revesz, this relationship can be improved to {k_0 \sim (\theta-1/2)^{-3/2}}.)

Unfortunately, the Elliott-Halberstam conjecture (which we will state properly below) is only known for {\theta<1/2}, an important result known as the Bombieri-Vinogradov theorem. If one uses the Bombieri-Vinogradov theorem instead of the Elliott-Halberstam conjecture, Goldston, Pintz, and Yildirim were still able to show the highly non-trivial result that there were infinitely many pairs {p_{n+1},p_n} of consecutive primes with {(p_{n+1}-p_n) / \log p_n \rightarrow 0} (actually they showed more than this; see e.g. this survey of Soundararajan for details).

Actually, the full strength of the Elliott-Halberstam conjecture is not needed for these results. There is a technical specialisation of the Elliott-Halberstam conjecture which does not presently have a commonly accepted name; I will call it the Motohashi-Pintz-Zhang conjecture {MPZ[\varpi]} in this post, where {0 < \varpi < 1/4} is a parameter. We will define this conjecture more precisely later, but let us remark for now that {MPZ[\varpi]} is a consequence of {EH[\frac{1}{2}+2\varpi]}.

We then have the following two theorems. Firstly, we have the following strengthening of Theorem 4:

Theorem 5 (Motohashi-Pintz-Zhang) Suppose that {MPZ[\varpi]} is true for some {0 < \varpi < 1/4}. Then {DHL[k_0,2]} is true for some {k_0}.

A version of this result (with a slightly different formulation of {MPZ[\varpi]}) appears in this paper of Motohashi and Pintz, and in the paper of Zhang, Theorem 5 is proven for the concrete values {\varpi = 1/1168} and {k_0 = 3,500,000}. We will supply a self-contained proof of Theorem 5 below the fold, the constants upon those in Zhang’s paper (in particular, for {\varpi = 1/1168}, we can take {k_0} as low as {341,640}, with further improvements on the way). As with Theorem 4, we have an inverse quadratic relationship {k_0 \sim \varpi^{-2}}.

In his paper, Zhang obtained for the first time an unconditional advance on {MPZ[\varpi]}:

Theorem 6 (Zhang) {MPZ[\varpi]} is true for all {0 < \varpi \leq 1/1168}.

This is a deep result, building upon the work of Fouvry-Iwaniec, Friedlander-Iwaniec and Bombieri-Friedlander-Iwaniec which established results of a similar nature to {MPZ[\varpi]} but simpler in some key respects. We will not discuss this result further here, except to say that they rely on the (higher-dimensional case of the) Weil conjectures, which were famously proven by Deligne using methods from l-adic cohomology. Also, it was believed among at least some experts that the methods of Bombieri, Fouvry, Friedlander, and Iwaniec were not quite strong enough to obtain results of the form {MPZ[\varpi]}, making Theorem 6 a particularly impressive achievement.

Combining Theorem 6 with Theorem 5 we obtain {DHL[k_0,2]} for some finite {k_0}; Zhang obtains this for {k_0 = 3,500,000} but as detailed below, this can be lowered to {k_0 = 341,640}. This in turn gives infinitely many pairs of consecutive primes of separation at most {H(k_0)}. Zhang gives a simple argument that bounds {H(3,500,000)} by {70,000,000}, giving his famous result that there are infinitely many pairs of primes of separation at most {70,000,000}; by being a bit more careful (as discussed in this post) one can lower the upper bound on {H(3,500,000)} to {57,554,086}, and if one instead uses the newer value {k_0 = 341,640} for {k_0} one can instead use the bound {H(341,640) \leq 4,982,086}. (Many thanks to Scott Morrison for these numerics.) UPDATE: These values are now obsolete; see this web page for the latest bounds.

In this post we would like to give a self-contained proof of both Theorem 4 and Theorem 5, which are both sieve-theoretic results that are mainly elementary in nature. (But, as stated earlier, we will not discuss the deepest new result in Zhang’s paper, namely Theorem 6.) Our presentation will deviate a little bit from the traditional sieve-theoretic approach in a few places. Firstly, there is a portion of the argument that is traditionally handled using contour integration and properties of the Riemann zeta function; we will present a “cheaper” approach (which Ben Green and I used in our papers, e.g. in this one) using Fourier analysis, with the only property used about the zeta function {\zeta(s)} being the elementary fact that blows up like {\frac{1}{s-1}} as one approaches {1} from the right. To deal with the contribution of small primes (which is the source of the singular series {{\mathfrak G}}), it will be convenient to use the “{W}-trick” (introduced in this paper of mine with Ben), passing to a single residue class mod {W} (where {W} is the product of all the small primes) to end up in a situation in which all small primes have been “turned off” which leads to better pseudorandomness properties (for instance, once one eliminates all multiples of small primes, almost all pairs of remaining numbers will be coprime).

Read the rest of this entry »

A finite group {G=(G,\cdot)} is said to be a Frobenius group if there is a non-trivial subgroup {H} of {G} (known as the Frobenius complement of {G}) such that the conjugates {gHg^{-1}} of {H} are “disjoint as possible” in the sense that {H \cap gHg^{-1} = \{1\}} whenever {g \not \in H}. This gives a decomposition

\displaystyle  G = \bigcup_{gH \in G/H} (gHg^{-1} \backslash \{1\}) \cup K \ \ \ \ \ (1)

where the Frobenius kernel {K} of {G} is defined as the identity element {1} together with all the non-identity elements that are not conjugate to any element of {H}. Taking cardinalities, we conclude that

\displaystyle  |G| = \frac{|G|}{|H|} (|H| - 1) + |K|

and hence

\displaystyle  |H| |K| = |G|. \ \ \ \ \ (2)

A remarkable theorem of Frobenius gives an unexpected amount of structure on {K} and hence on {G}:

Theorem 1 (Frobenius’ theorem) Let {G} be a Frobenius group with Frobenius complement {H} and Frobenius kernel {K}. Then {K} is a normal subgroup of {G}, and hence (by (2) and the disjointness of {H} and {K} outside the identity) {G} is the semidirect product {K \rtimes H} of {H} and {K}.

I discussed Frobenius’ theorem and its proof in this recent blog post. This proof uses the theory of characters on a finite group {G}, in particular relying on the fact that a character on a subgroup {H} can induce a character on {G}, which can then be decomposed into irreducible characters with natural number coefficients. Remarkably, even though a century has passed since Frobenius’ original argument, there is no proof known of this theorem which avoids character theory entirely; there are elementary proofs known when the complement {H} has even order or when {H} is solvable (we review both of these cases below the fold), which by the Feit-Thompson theorem does cover all the cases, but the proof of the Feit-Thompson theorem involves plenty of character theory (and also relies on Theorem 1). (The answers to this MathOverflow question give a good overview of the current state of affairs.)

I have been playing around recently with the problem of finding a character-free proof of Frobenius’ theorem. I didn’t succeed in obtaining a completely elementary proof, but I did find an argument which replaces character theory (which can be viewed as coming from the representation theory of the non-commutative group algebra {{\bf C} G \equiv L^2(G)}) with the Fourier analysis of class functions (i.e. the representation theory of the centre {Z({\bf C} G) \equiv L^2(G)^G} of the group algebra), thus replacing non-commutative representation theory by commutative representation theory. This is not a particularly radical depature from the existing proofs of Frobenius’ theorem, but it did seem to be a new proof which was technically “character-free” (even if it was not all that far from character-based in spirit), so I thought I would record it here.

The main ideas are as follows. The space {L^2(G)^G} of class functions can be viewed as a commutative algebra with respect to the convolution operation {*}; as the regular representation is unitary and faithful, this algebra contains no nilpotent elements. As such, (Gelfand-style) Fourier analysis suggests that one can analyse this algebra through the idempotents: class functions {\phi} such that {\phi*\phi = \phi}. In terms of characters, idempotents are nothing more than sums of the form {\sum_{\chi \in \Sigma} \chi(1) \chi} for various collections {\Sigma} of characters, but we can perform a fair amount of analysis on idempotents directly without recourse to characters. In particular, it turns out that idempotents enjoy some important integrality properties that can be established without invoking characters: for instance, by taking traces one can check that {\phi(1)} is a natural number, and more generally we will show that {{\bf E}_{(a,b) \in S} {\bf E}_{x \in G} \phi( a x b^{-1} x^{-1} )} is a natural number whenever {S} is a subgroup of {G \times G} (see Corollary 4 below). For instance, the quantity

\displaystyle  \hbox{rank}(\phi) := {\bf E}_{a \in G} {\bf E}_{x \in G} \phi(a xa^{-1} x^{-1})

is a natural number which we will call the rank of {\phi} (as it is also the linear rank of the transformation {f \mapsto f*\phi} on {L^2(G)}).

In the case that {G} is a Frobenius group with kernel {K}, the above integrality properties can be used after some elementary manipulations to establish that for any idempotent {\phi}, the quantity

\displaystyle  \frac{1}{|G|} \sum_{a \in K} {\bf E}_{x \in G} \phi( axa^{-1}x^{-1} ) - \frac{1}{|G| |K|} \sum_{a,b \in K} \phi(ab^{-1}) \ \ \ \ \ (3)

is an integer. On the other hand, one can also show by elementary means that this quantity lies between {0} and {\hbox{rank}(\phi)}. These two facts are not strong enough on their own to impose much further structure on {\phi}, unless one restricts attention to minimal idempotents {\phi}. In this case spectral theory (or Gelfand theory, or the fundamental theorem of algebra) tells us that {\phi} has rank one, and then the integrality gap comes into play and forces the quantity (3) to always be either zero or one. This can be used to imply that the convolution action of every minimal idempotent {\phi} either preserves {\frac{|G|}{|K|} 1_K} or annihilates it, which makes {\frac{|G|}{|K|} 1_K} itself an idempotent, which makes {K} normal.

Read the rest of this entry »

Suppose that {G = (G,\cdot)} is a finite group of even order, thus {|G|} is a multiple of two. By Cauchy’s theorem, this implies that {G} contains an involution: an element {g} in {G} of order two. (Indeed, if no such involution existed, then {G} would be partitioned into doubletons {\{g,g^{-1}\}} together with the identity, so that {|G|} would be odd, a contradiction.) Of course, groups of odd order have no involutions {g}, thanks to Lagrange’s theorem (since {G} cannot split into doubletons {\{ h, hg \}}).

The classical Brauer-Fowler theorem asserts that if a group {G} has many involutions, then it must have a large non-trivial subgroup:

Theorem 1 (Brauer-Fowler theorem) Let {G} be a finite group with at least {|G|/n} involutions for some {n > 1}. Then {G} contains a proper subgroup {H} of index at most {n^2}.

This theorem (which is Theorem 2F in the original paper of Brauer and Fowler, who in fact manage to sharpen {n^2} slightly to {n(n+2)/2}) has a number of quick corollaries which are also referred to as “the” Brauer-Fowler theorem. For instance, if {g} is a an involution of a group {G}, and the centraliser {C_G(g) := \{ h \in G: gh = hg\}} has order {n}, then clearly {n \geq 2} (as {C_G(g)} contains {1} and {g}) and the conjugacy class {\{ aga^{-1}: a \in G \}} has order {|G|/n} (since the map {a \mapsto aga^{-1}} has preimages that are cosets of {C_G(g)}). Every conjugate of an involution is again an involution, so by the Brauer-Fowler theorem {G} contains a subgroup of order at least {\max( n, |G|/n^2)}. In particular, we can conclude that every group {G} of even order contains a proper subgroup of order at least {|G|^{1/3}}.

Another corollary is that the size of a simple group of even order can be controlled by the size of a centraliser of one of its involutions:

Corollary 2 (Brauer-Fowler theorem) Let {G} be a finite simple group with an involution {g}, and suppose that {C_G(g)} has order {n}. Then {G} has order at most {(n^2)!}.

Indeed, by the previous discussion {G} has a proper subgroup {H} of index less than {n^2}, which then gives a non-trivial permutation action of {G} on the coset space {G/H}. The kernel of this action is a proper normal subgroup of {G} and is thus trivial, so the action is faithful, and the claim follows.

If one assumes the Feit-Thompson theorem that all groups of odd order are solvable, then Corollary 2 suggests a strategy (first proposed by Brauer himself in 1954) to prove the classification of finite simple groups (CFSG) by induction on the order of the group. Namely, assume for contradiction that the CFSG failed, so that there is a counterexample {G} of minimal order {|G|} to the classification. This is a non-abelian finite simple group; by the Feit-Thompson theorem, it has even order and thus has at least one involution {g}. Take such an involution and consider its centraliser {C_G(g)}; this is a proper subgroup of {G} of some order {n < |G|}. As {G} is a minimal counterexample to the classification, one can in principle describe {C_G(g)} in terms of the CFSG by factoring the group into simple components (via a composition series) and applying the CFSG to each such component. Now, the “only” thing left to do is to verify, for each isomorphism class of {C_G(g)}, that all the possible simple groups {G} that could have this type of group as a centraliser of an involution obey the CFSG; Corollary 2 tells us that for each such isomorphism class for {C_G(g)}, there are only finitely many {G} that could generate this class for one of its centralisers, so this task should be doable in principle for any given isomorphism class for {C_G(g)}. That’s all one needs to do to prove the classification of finite simple groups!

Needless to say, this program turns out to be far more difficult than the above summary suggests, and the actual proof of the CFSG does not quite proceed along these lines. However, a significant portion of the argument is based on a generalisation of this strategy, in which the concept of a centraliser of an involution is replaced by the more general notion of a normaliser of a {p}-group, and one studies not just a single normaliser but rather the entire family of such normalisers and how they interact with each other (and in particular, which normalisers of {p}-groups commute with each other), motivated in part by the theory of Tits buildings for Lie groups which dictates a very specific type of interaction structure between these {p}-groups in the key case when {G} is a (sufficiently high rank) finite simple group of Lie type over a field of characteristic {p}. See the text of Aschbacher, Lyons, Smith, and Solomon for a more detailed description of this strategy.

The Brauer-Fowler theorem can be proven by a nice application of character theory, of the type discussed in this recent blog post, ultimately based on analysing the alternating tensor power of representations; I reproduce a version of this argument (taken from this text of Isaacs) below the fold. (The original argument of Brauer and Fowler is more combinatorial in nature.) However, I wanted to record a variant of the argument that relies not on the fine properties of characters, but on the cruder theory of quasirandomness for groups, the modern study of which was initiated by Gowers, and is discussed for instance in this previous post. It gives the following slightly weaker version of Corollary 2:

Corollary 3 (Weak Brauer-Fowler theorem) Let {G} be a finite simple group with an involution {g}, and suppose that {C_G(g)} has order {n}. Then {G} can be identified with a subgroup of the unitary group {U_{4n^3}({\bf C})}.

One can get an upper bound on {|G|} from this corollary using Jordan’s theorem, but the resulting bound is a bit weaker than that in Corollary 2 (and the best bounds on Jordan’s theorem require the CFSG!).

Proof: Let {A} be the set of all involutions in {G}, then as discussed above {|A| \geq |G|/n}. We may assume that {G} has no non-trivial unitary representation of dimension less than {4n^3} (since such representations are automatically faithful by the simplicity of {G}); thus, in the language of quasirandomness, {G} is {4n^3}-quasirandom, and is also non-abelian. We have the basic convolution estimate

\displaystyle  \|1_A * 1_A * 1_A - \frac{|A|^3}{|G|} \|_{\ell^\infty(G)} \leq (4n^3)^{-1/2} |G|^{1/2} |A|^{3/2}

(see Exercise 10 from this previous blog post). In particular,

\displaystyle  1_A * 1_A * 1_A(0) \geq \frac{|A|^3}{|G|} - (4n^3)^{-1/2} |G|^{1/2} |A|^{3/2} \geq \frac{1}{2n^3} |G|^2

and so there are at least {|G|^2/2n^3} pairs {(g,h) \in A \times A} such that {gh \in A^{-1} = A}, i.e. involutions {g,h} whose product is also an involution. But any such involutions necessarily commute, since

\displaystyle  g (gh) h = g^2 h^2 = 1 = (gh)^2 = g (hg) h.

Thus there are at least {|G|^2/2n^3} pairs {(g,h) \in G \times G} of non-identity elements that commute, so by the pigeonhole principle there is a non-identity {g \in G} whose centraliser {C_G(g)} has order at least {|G|/2n^3}. This centraliser cannot be all of {G} since this would make {g} central which contradicts the non-abelian simple nature of {G}. But then the quasiregular representation of {G} on {G/C_G(g)} has dimension at most {2n^3}, contradicting the quasirandomness. \Box

Read the rest of this entry »

An abstract finite-dimensional complex Lie algebra, or Lie algebra for short, is a finite-dimensional complex vector space {{\mathfrak g}} together with an anti-symmetric bilinear form {[,] = [,]_{\mathfrak g}: {\mathfrak g} \times {\mathfrak g} \rightarrow {\mathfrak g}} that obeys the Jacobi identity

\displaystyle  [[x,y],z] + [[y,z],x] + [[z,x],y] = 0 \ \ \ \ \ (1)

for all {x,y,z \in {\mathfrak g}}; by anti-symmetry one can also rewrite the Jacobi identity as

\displaystyle  [x,[y,z]] = [[x,y],z] + [y,[x,z]]. \ \ \ \ \ (2)

We will usually omit the subscript from the Lie bracket {[,]_{\mathfrak g}} when this will not cause ambiguity. A homomorphism {\phi: {\mathfrak g} \rightarrow {\mathfrak h}} between two Lie algebras {{\mathfrak g},{\mathfrak h}} is a linear map that respects the Lie bracket, thus {\phi([x,y]_{\mathfrak g}) =[\phi(x),\phi(y)]_{\mathfrak h}} for all {x,y \in {\mathfrak g}}. As with many other classes of mathematical objects, the class of Lie algebras together with their homomorphisms then form a category. One can of course also consider Lie algebras in infinite dimension or over other fields, but we will restrict attention throughout these notes to the finite-dimensional complex case. The trivial, zero-dimensional Lie algebra is denoted {0}; Lie algebras of positive dimension will be called non-trivial.

Lie algebras come up in many contexts in mathematics, in particular arising as the tangent space of complex Lie groups. It is thus very profitable to think of Lie algebras as being the infinitesimal component of a Lie group, and in particular almost all of the notation and concepts that are applicable to Lie groups (e.g. nilpotence, solvability, extensions, etc.) have infinitesimal counterparts in the category of Lie algebras (often with exactly the same terminology). See this previous blog post for more discussion about the connection between Lie algebras and Lie groups (that post was focused over the reals instead of the complexes, but much of the discussion carries over to the complex case).

A particular example of a Lie algebra is the general linear Lie algebra {{\mathfrak{gl}}(V)} of linear transformations {x: V \rightarrow V} on a finite-dimensional complex vector space (or vector space for short) {V}, with the commutator Lie bracket {[x,y] := xy-yx}; one easily verifies that this is indeed an abstract Lie algebra. We will define a concrete Lie algebra to be a Lie algebra that is a subalgebra of {{\mathfrak{gl}}(V)} for some vector space {V}, and similarly define a representation of a Lie algebra {{\mathfrak g}} to be a homomorphism {\rho: {\mathfrak g} \rightarrow {\mathfrak h}} into a concrete Lie algebra {{\mathfrak h}}. It is a deep theorem of Ado (discussed in this previous post) that every abstract Lie algebra is in fact isomorphic to a concrete one (or equivalently, that every abstract Lie algebra has a faithful representation), but we will not need or prove this fact here.

Even without Ado’s theorem, though, the structure of abstract Lie algebras is very well understood. As with many other objects in an algebraic category, a basic way to understand a Lie algebra {{\mathfrak g}} is to factor it into two simpler algebras {{\mathfrak h}, {\mathfrak k}} via a short exact sequence

\displaystyle  0 \rightarrow {\mathfrak h} \rightarrow {\mathfrak g} \rightarrow {\mathfrak k} \rightarrow 0, \ \ \ \ \ (3)

thus one has an injective homomorphism from {{\mathfrak h}} to {{\mathfrak g}} and a surjective homomorphism from {{\mathfrak g}} to {{\mathfrak k}} such that the image of the former homomorphism is the kernel of the latter. (To be pedantic, a short exact sequence in a general category requires these homomorphisms to be monomorphisms and epimorphisms respectively, but in the category of Lie algebras these turn out to reduce to the more familiar concepts of injectivity and surjectivity respectively.) Given such a sequence, one can (non-uniquely) identify {{\mathfrak g}} with the vector space {{\mathfrak h} \times {\mathfrak k}} equipped with a Lie bracket of the form

\displaystyle  [(t,x), (s,y)]_{\mathfrak g} = ([t,s]_{\mathfrak h} + A(t,y) - A(s,x) + B(x,y), [x,y]_{\mathfrak k}) \ \ \ \ \ (4)

for some bilinear maps {A: {\mathfrak h} \times {\mathfrak k} \rightarrow {\mathfrak h}} and {B: {\mathfrak k} \times {\mathfrak k} \rightarrow {\mathfrak h}} that obey some Jacobi-type identities which we will not record here. Understanding exactly what maps {A,B} are possible here (up to coordinate change) can be a difficult task (and is one of the key objectives of Lie algebra cohomology), but in principle at least, the problem of understanding {{\mathfrak g}} can be reduced to that of understanding that of its factors {{\mathfrak k}, {\mathfrak h}}. To emphasise this, I will (perhaps idiosyncratically) express the existence of a short exact sequence (3) by the ATLAS-type notation

\displaystyle  {\mathfrak g} = {\mathfrak h} . {\mathfrak k} \ \ \ \ \ (5)

although one should caution that for given {{\mathfrak h}} and {{\mathfrak k}}, there can be multiple non-isomorphic {{\mathfrak g}} that can form a short exact sequence with {{\mathfrak h},{\mathfrak k}}, so that {{\mathfrak h} . {\mathfrak k}} is not a uniquely defined combination of {{\mathfrak h}} and {{\mathfrak k}}; one could emphasise this by writing {{\mathfrak h} ._{A,B} {\mathfrak k}} instead of {{\mathfrak h} . {\mathfrak k}}, though we will not do so here. We will refer to {{\mathfrak g}} as an extension of {{\mathfrak k}} by {{\mathfrak h}}, and read the notation (5) as “ {{\mathfrak g}} is {{\mathfrak h}}-by-{{\mathfrak k}}“; confusingly, these two notations reverse the subject and object of “by”, but unfortunately both notations are well entrenched in the literature. We caution that the operation {.} is not commutative, and it is only partly associative: every Lie algebra of the form {{\mathfrak k} . ({\mathfrak h} . \l)} is also of the form {({\mathfrak k} . {\mathfrak h}) . {\mathfrak l}}, but the converse is not true (see this previous blog post for some related discussion). As we are working in the infinitesimal world of Lie algebras (which have an additive group operation) rather than Lie groups (in which the group operation is usually written multiplicatively), it may help to think of {{\mathfrak h} . {\mathfrak k}} as a (twisted) “sum” of {{\mathfrak h}} and {{\mathfrak k}} rather than a “product”; for instance, we have {{\mathfrak g} = 0 . {\mathfrak g}} and {{\mathfrak g} = {\mathfrak g} . 0}, and also {\dim {\mathfrak h} . {\mathfrak k} = \dim {\mathfrak h} + \dim {\mathfrak k}}.

Special examples of extensions {{\mathfrak h} .{\mathfrak k}} of {{\mathfrak k}} by {{\mathfrak h}} include the direct sum (or direct product) {{\mathfrak h} \oplus {\mathfrak k}} (also denoted {{\mathfrak h} \times {\mathfrak k}},) which is given by the construction (4) with {A} and {B} both vanishing, and the split extension (or semidirect product) {{\mathfrak h} : {\mathfrak k} = {\mathfrak h} :_\rho {\mathfrak k}} (also denoted {{\mathfrak h} \ltimes {\mathfrak k} = {\mathfrak h} \ltimes_\rho {\mathfrak k}}), which is given by the construction (4) with {B} vanishing and the bilinear map {A: {\mathfrak h} \times {\mathfrak k} \rightarrow {\mathfrak h}} taking the form

\displaystyle  A( t, x ) = \rho(x)(t)

for some representation {\rho: {\mathfrak k} \rightarrow \hbox{Der} {\mathfrak h}} of {{\mathfrak k}} in the concrete Lie algebra of derivations {\hbox{Der} {\mathfrak h} \subset {\mathfrak{gl}}({\mathfrak h})} of {{\mathfrak h}}, that is to say the algebra of linear maps {D: {\mathfrak h} \rightarrow {\mathfrak h}} that obey the Leibniz rule

\displaystyle  D[s,t]_{\mathfrak h} = [Ds,t]_{\mathfrak h} + [s,Dt]_{\mathfrak h}

for all {s,t \in {\mathfrak h}}. (The derivation algebra {\hbox{Der} {\mathfrak g}} of a Lie algebra {{\mathfrak g}} is analogous to the automorphism group {\hbox{Aut}(G)} of a Lie group {G}, with the two concepts being intertwined by the tangent space functor {G \mapsto {\mathfrak g}} from Lie groups to Lie algebras (i.e. the derivation algebra is the infinitesimal version of the automorphism group). Of course, this functor also intertwines the Lie algebra and Lie group versions of most of the other concepts discussed here, such as extensions, semidirect products, etc.)

There are two general ways to factor a Lie algebra {{\mathfrak g}} as an extension {{\mathfrak h} . {\mathfrak k}} of a smaller Lie algebra {{\mathfrak k}} by another smaller Lie algebra {{\mathfrak h}}. One is to locate a Lie algebra ideal (or ideal for short) {{\mathfrak h}} in {{\mathfrak g}}, thus {[{\mathfrak h},{\mathfrak g}] \subset {\mathfrak h}}, where {[{\mathfrak h},{\mathfrak g}]} denotes the Lie algebra generated by {\{ [x,y]: x \in {\mathfrak h}, y \in {\mathfrak g} \}}, and then take {{\mathfrak k}} to be the quotient space {{\mathfrak g}/{\mathfrak h}} in the usual manner; one can check that {{\mathfrak h}}, {{\mathfrak k}} are also Lie algebras and that we do indeed have a short exact sequence

\displaystyle  {\mathfrak g} = {\mathfrak h} . ({\mathfrak g}/{\mathfrak h}).

Conversely, whenever one has a factorisation {{\mathfrak g} = {\mathfrak h} . {\mathfrak k}}, one can identify {{\mathfrak h}} with an ideal in {{\mathfrak g}}, and {{\mathfrak k}} with the quotient of {{\mathfrak g}} by {{\mathfrak h}}.

The other general way to obtain such a factorisation is is to start with a homomorphism {\rho: {\mathfrak g} \rightarrow {\mathfrak m}} of {{\mathfrak g}} into another Lie algebra {{\mathfrak m}}, take {{\mathfrak k}} to be the image {\rho({\mathfrak g})} of {{\mathfrak g}}, and {{\mathfrak h}} to be the kernel {\hbox{ker} \rho := \{ x \in {\mathfrak g}: \rho(x) = 0 \}}. Again, it is easy to see that this does indeed create a short exact sequence:

\displaystyle  {\mathfrak g} = \hbox{ker} \rho . \rho({\mathfrak g}).

Conversely, whenever one has a factorisation {{\mathfrak g} = {\mathfrak h} . {\mathfrak k}}, one can identify {{\mathfrak k}} with the image of {{\mathfrak g}} under some homomorphism, and {{\mathfrak h}} with the kernel of that homomorphism. Note that if a representation {\rho: {\mathfrak g} \rightarrow {\mathfrak m}} is faithful (i.e. injective), then the kernel is trivial and {{\mathfrak g}} is isomorphic to {\rho({\mathfrak g})}.

Now we consider some examples of factoring some class of Lie algebras into simpler Lie algebras. The easiest examples of Lie algebras to understand are the abelian Lie algebras {{\mathfrak g}}, in which the Lie bracket identically vanishes. Every one-dimensional Lie algebra is automatically abelian, and thus isomorphic to the scalar algebra {{\bf C}}. Conversely, by using an arbitrary linear basis of {{\mathfrak g}}, we see that an abelian Lie algebra is isomorphic to the direct sum of one-dimensional algebras. Thus, a Lie algebra is abelian if and only if it is isomorphic to the direct sum of finitely many copies of {{\bf C}}.

Now consider a Lie algebra {{\mathfrak g}} that is not necessarily abelian. We then form the derived algebra {[{\mathfrak g},{\mathfrak g}]}; this algebra is trivial if and only if {{\mathfrak g}} is abelian. It is easy to see that {[{\mathfrak h},{\mathfrak k}]} is an ideal whenever {{\mathfrak h},{\mathfrak k}} are ideals, so in particular the derived algebra {[{\mathfrak g},{\mathfrak g}]} is an ideal and we thus have the short exact sequence

\displaystyle  {\mathfrak g} = [{\mathfrak g},{\mathfrak g}] . ({\mathfrak g}/[{\mathfrak g},{\mathfrak g}]).

The algebra {{\mathfrak g}/[{\mathfrak g},{\mathfrak g}]} is the maximal abelian quotient of {{\mathfrak g}}, and is known as the abelianisation of {{\mathfrak g}}. If it is trivial, we call the Lie algebra perfect. If instead it is non-trivial, then the derived algebra has strictly smaller dimension than {{\mathfrak g}}. From this, it is natural to associate two series to any Lie algebra {{\mathfrak g}}, the lower central series

\displaystyle  {\mathfrak g}_1 = {\mathfrak g}; {\mathfrak g}_2 := [{\mathfrak g}, {\mathfrak g}_1]; {\mathfrak g}_3 := [{\mathfrak g}, {\mathfrak g}_2]; \ldots

and the derived series

\displaystyle  {\mathfrak g}^{(1)} := {\mathfrak g}; {\mathfrak g}^{(2)} := [{\mathfrak g}^{(1)}, {\mathfrak g}^{(1)}]; {\mathfrak g}^{(3)} := [{\mathfrak g}^{(2)}, {\mathfrak g}^{(2)}]; \ldots.

By induction we see that these are both decreasing series of ideals of {{\mathfrak g}}, with the derived series being slightly smaller ({{\mathfrak g}^{(k)} \subseteq {\mathfrak g}_k} for all {k}). We say that a Lie algebra is nilpotent if its lower central series is eventually trivial, and solvable if its derived series eventually becomes trivial. Thus, abelian Lie algebras are nilpotent, and nilpotent Lie algebras are solvable, but the converses are not necessarily true. For instance, in the general linear group {{\mathfrak{gl}}_n = {\mathfrak{gl}}({\bf C}^n)}, which can be identified with the Lie algebra of {n \times n} complex matrices, the subalgebra {{\mathfrak n}} of strictly upper triangular matrices is nilpotent (but not abelian for {n \geq 3}), while the subalgebra {{\mathfrak n}} of upper triangular matrices is solvable (but not nilpotent for {n \geq 2}). It is also clear that any subalgebra of a nilpotent algebra is nilpotent, and similarly for solvable or abelian algebras.

From the above discussion we see that a Lie algebra is solvable if and only if it can be represented by a tower of abelian extensions, thus

\displaystyle  {\mathfrak g} = {\mathfrak a}_1 . ({\mathfrak a}_2 . \ldots ({\mathfrak a}_{k-1} . {\mathfrak a}_k) \ldots )

for some abelian {{\mathfrak a}_1,\ldots,{\mathfrak a}_k}. Similarly, a Lie algebra {{\mathfrak g}} is nilpotent if it is expressible as a tower of central extensions (so that in all the extensions {{\mathfrak h} . {\mathfrak k}} in the above factorisation, {{\mathfrak h}} is central in {{\mathfrak h} . {\mathfrak k}}, where we say that {{\mathfrak h}} is central in {{\mathfrak g}} if {[{\mathfrak h},{\mathfrak g}]=0}). We also see that an extension {{\mathfrak h} . {\mathfrak k}} is solvable if and only of both factors {{\mathfrak h}, {\mathfrak k}} are solvable.

For our next fundamental example of using short exact sequences to split a general Lie algebra into simpler objects, we observe that every abstract Lie algebra {{\mathfrak g}} has an adjoint representation {\hbox{ad}: {\mathfrak g} \rightarrow \hbox{ad} {\mathfrak g} \subset {\mathfrak{gl}}({\mathfrak g})}, where for each {x \in {\mathfrak g}}, {\hbox{ad} x \in {\mathfrak{gl}}({\mathfrak g})} is the linear map {(\hbox{ad} x)(y) := [x,y]}; one easily verifies that this is indeed a representation (indeed, (2) is equivalent to the assertion that {\hbox{ad} [x,y] = [\hbox{ad} x, \hbox{ad} y]} for all {x,y \in {\mathfrak g}}). The kernel of this representation is the center {Z({\mathfrak g}) := \{ x \in {\mathfrak g}: [x,{\mathfrak g}] = 0\}}, which the maximal central subalgebra of {{\mathfrak g}}. We thus have the short exact sequence

\displaystyle  {\mathfrak g} = Z({\mathfrak g}) . \hbox{ad} g \ \ \ \ \ (6)

which, among other things, shows that every abstract Lie algebra is a central extension of a concrete Lie algebra (which can serve as a cheap substitute for Ado’s theorem mentioned earlier).

For our next fundamental decomposition of Lie algebras, we need some more definitions. A Lie algebra {{\mathfrak g}} is simple if it is non-abelian and has no ideals other than {0} and {{\mathfrak g}}; thus simple Lie algebras cannot be factored {{\mathfrak g} = {\mathfrak h} . {\mathfrak k}} into strictly smaller algebras {{\mathfrak h},{\mathfrak k}}. In particular, simple Lie algebras are automatically perfect and centerless. We have the following fundamental theorem:

Theorem 1 (Equivalent definitions of semisimplicity) Let {{\mathfrak g}} be a Lie algebra. Then the following are equivalent:

  • (i) {{\mathfrak g}} does not contain any non-trivial solvable ideal.
  • (ii) {{\mathfrak g}} does not contain any non-trivial abelian ideal.
  • (iii) The Killing form {K: {\mathfrak g} \times {\mathfrak g} \rightarrow {\bf C}}, defined as the bilinear form {K(x,y) := \hbox{tr}_{\mathfrak g}( (\hbox{ad} x) (\hbox{ad} y) )}, is non-degenerate on {{\mathfrak g}}.
  • (iv) {{\mathfrak g}} is isomorphic to the direct sum of finitely many non-abelian simple Lie algebras.

We review the proof of this theorem later in these notes. A Lie algebra obeying any (and hence all) of the properties (i)-(iv) is known as a semisimple Lie algebra. The statement (iv) is usually taken as the definition of semisimplicity; the equivalence of (iv) and (i) is then known as Weyl’s complete reducibility theorem, and the equivalence of (iv) and (iii) is known as the Cartan semisimplicity criterion. (The equivalence of (i) and (ii) is easy.)

If {{\mathfrak h}} and {{\mathfrak k}} are solvable ideals of a Lie algebra {{\mathfrak g}}, then it is not difficult to see that the vector sum {{\mathfrak h}+{\mathfrak k}} is also a solvable ideal (because on quotienting by {{\mathfrak h}} we see that the derived series of {{\mathfrak h}+{\mathfrak k}} must eventually fall inside {{\mathfrak h}}, and thence must eventually become trivial by the solvability of {{\mathfrak h}}). As our Lie algebras are finite dimensional, we conclude that {{\mathfrak g}} has a unique maximal solvable ideal, known as the radical {\hbox{rad} {\mathfrak g}} of {{\mathfrak g}}. The quotient {{\mathfrak g}/\hbox{rad} {\mathfrak g}} is then a Lie algebra with trivial radical, and is thus semisimple by the above theorem, giving the Levi decomposition

\displaystyle  {\mathfrak g} = \hbox{rad} {\mathfrak g} . ({\mathfrak g} / \hbox{rad} {\mathfrak g})

expressing an arbitrary Lie algebra as an extension of a semisimple Lie algebra {{\mathfrak g}/\hbox{rad}{\mathfrak g}} by a solvable algebra {\hbox{rad} {\mathfrak g}} (and it is not hard to see that this is the only possible such extension up to isomorphism). Indeed, a deep theorem of Levi allows one to upgrade this decomposition to a split extension

\displaystyle  {\mathfrak g} = \hbox{rad} {\mathfrak g} : ({\mathfrak g} / \hbox{rad} {\mathfrak g})

although we will not need or prove this result here.

In view of the above decompositions, we see that we can factor any Lie algebra (using a suitable combination of direct sums and extensions) into a finite number of simple Lie algebras and the scalar algebra {{\bf C}}. In principle, this means that one can understand an arbitrary Lie algebra once one understands all the simple Lie algebras (which, being defined over {{\bf C}}, are somewhat confusingly referred to as simple complex Lie algebras in the literature). Amazingly, this latter class of algebras are completely classified:

Theorem 2 (Classification of simple Lie algebras) Up to isomorphism, every simple Lie algebra is of one of the following forms:

  • {A_n = \mathfrak{sl}_{n+1}} for some {n \geq 1}.
  • {B_n = \mathfrak{so}_{2n+1}} for some {n \geq 2}.
  • {C_n = \mathfrak{sp}_{2n}} for some {n \geq 3}.
  • {D_n = \mathfrak{so}_{2n}} for some {n \geq 4}.
  • {E_6, E_7}, or {E_8}.
  • {F_4}.
  • {G_2}.

(The precise definition of the classical Lie algebras {A_n,B_n,C_n,D_n} and the exceptional Lie algebras {E_6,E_7,E_8,F_4,G_2} will be recalled later.)

(One can extend the families {A_n,B_n,C_n,D_n} of classical Lie algebras a little bit to smaller values of {n}, but the resulting algebras are either isomorphic to other algebras on this list, or cease to be simple; see this previous post for further discussion.)

This classification is a basic starting point for the classification of many other related objects, including Lie algebras and Lie groups over more general fields (e.g. the reals {{\bf R}}), as well as finite simple groups. Being so fundamental to the subject, this classification is covered in almost every basic textbook in Lie algebras, and I myself learned it many years ago in an honours undergraduate course back in Australia. The proof is rather lengthy, though, and I have always had difficulty keeping it straight in my head. So I have decided to write some notes on the classification in this blog post, aiming to be self-contained (though moving rapidly). There is no new material in this post, though; it is all drawn from standard reference texts (I relied particularly on Fulton and Harris’s text, which I highly recommend). In fact it seems remarkably hard to deviate from the standard routes given in the literature to the classification; I would be interested in knowing about other ways to reach the classification (or substeps in that classification) that are genuinely different from the orthodox route.

Read the rest of this entry »

The classification of finite simple groups (CFSG), first announced in 1983 but only fully completed in 2004, is one of the monumental achievements of twentieth century mathematics. Spanning hundreds of papers and tens of thousands of pages, it has been called the “enormous theorem”. A “second generation” proof of the theorem is nearly completed which is a little shorter (estimated at about five thousand pages in length), but currently there is no reasonably sized proof of the classification.

An important precursor of the CFSG is the Feit-Thompson theorem from 1962-1963, which asserts that every finite group of odd order is solvable, or equivalently that every non-abelian finite simple group has even order. This is an immediate consequence of CFSG, and conversely the Feit-Thompson theorem is an essential starting point in the proof of the classification, since it allows one to reduce matters to groups of even order for which key additional tools (such as the Brauer-Fowler theorem) become available. The original proof of the Feit-Thompson theorem is 255 pages long, which is significantly shorter than the proof of the CFSG, but still far from short. While parts of the proof of the Feit-Thompson theorem have been simplified (and it has recently been converted, after six years of effort, into an argument that has been verified by the proof assistant Coq), the available proofs of this theorem are still extremely lengthy by any reasonable standard.

However, there is a significantly simpler special case of the Feit-Thompson theorem that was established previously by Suzuki in 1957, which was influential in the proof of the more general Feit-Thompson theorem (and thus indirectly to the proof of CFSG). Define a CA-group to be a group {G} with the property that the centraliser {C_G(x) := \{ g \in G: gx=xg \}} of any non-identity element {x \in G} is abelian; equivalently, the commuting relation {x \sim y} (defined as the relation that holds when {x} commutes with {y}, thus {xy=yx}) is an equivalence relation on the non-identity elements {G \backslash \{1\}} of {G}. Trivially, every abelian group is CA. A non-abelian example of a CA-group is the {ax+b} group of invertible affine transformations {x \mapsto ax+b} on a field {F}. A little less obviously, the special linear group {SL_2(F_q)} over a finite field {F_q} is a CA-group when {q} is a power of two. The finite simple groups of Lie type are not, in general, CA-groups, but when the rank is bounded they tend to behave as if they were “almost CA”; the centraliser of a generic element in {SL_d(F_q)}, for instance, when {d} is bounded and {q} is large), is typically a maximal torus (because most elements in {SL_d(F_q)} are regular semisimple) which is certainly abelian. In view of the CFSG, we thus see that CA or nearly CA groups form an important subclass of the simple groups, and it is thus of interest to study them separately. To this end, we have

Theorem 1 (Suzuki’s theorem on CA-groups) Every finite CA-group of odd order is solvable.

Of course, this theorem is superceded by the more general Feit-Thompson theorem, but Suzuki’s proof is substantially shorter (the original proof is nine pages) and will be given in this post. (See this survey of Solomon for some discussion of the link between Suzuki’s argument and the Feit-Thompson argument.) Suzuki’s analysis can be pushed further to give an essentially complete classification of all the finite CA-groups (of either odd or even order), but we will not pursue these matters here.

Moving even further down the ladder of simple precursors of CSFG is the following theorem of Frobenius from 1901. Define a Frobenius group to be a finite group {G} which has a subgroup {H} (called the Frobenius complement) with the property that all the non-trivial conjugates {gHg^{-1}} of {H} for {g \in G \backslash H}, intersect {H} only at the origin. For instance the {ax+b} group is also a Frobenius group (take {H} to be the affine transformations that fix a specified point {x_0 \in F}, e.g. the origin). This example suggests that there is some overlap between the notions of a Frobenius group and a CA group. Indeed, note that if {G} is a CA-group and {H} is a maximal abelian subgroup of {G}, then any conjugate {gHg^{-1}} of {H} that is not identical to {H} will intersect {H} only at the origin (because {H} and each of its conjugates consist of equivalence classes under the commuting relation {\sim}, together with the identity). So if a maximal abelian subgroup {H} of a CA-group is its own normaliser (thus {N(H) := \{ g \in G: gH=Hg\}} is equal to {H}), then the group is a Frobenius group.

Frobenius’ theorem places an unexpectedly strong amount of structure on a Frobenius group:

Theorem 2 (Frobenius’ theorem) Let {G} be a Frobenius group with Frobenius complement {H}. Then there exists a normal subgroup {K} of {G} (called the Frobenius kernel of {G}) such that {G} is the semi-direct product {H \ltimes K} of {H} and {K}.

Roughly speaking, this theorem indicates that all Frobenius groups “behave” like the {ax+b} example (which is a quintessential example of a semi-direct product).

Note that if every CA-group of odd order was either Frobenius or abelian, then Theorem 2 would imply Theorem 1 by an induction on the order of {G}, since any subgroup of a CA-group is clearly again a CA-group. Indeed, the proof of Suzuki’s theorem does basically proceed by this route (Suzuki’s arguments do indeed imply that CA-groups of odd order are Frobenius or abelian, although we will not quite establish that fact here).

Frobenius’ theorem can be reformulated in the following concrete combinatorial form:

Theorem 3 (Frobenius’ theorem, equivalent version) Let {G} be a group of permutations acting transitively on a finite set {X}, with the property that any non-identity permutation in {G} fixes at most one point in {X}. Then the set of permutations in {G} that fix no points in {X}, together with the identity, is closed under composition.

Again, a good example to keep in mind for this theorem is when {G} is the group of affine permutations on a field {F} (i.e. the {ax+b} group for that field), and {X} is the set of points on that field. In that case, the set of permutations in {G} that do not fix any points are the non-trivial translations.

To deduce Theorem 3 from Theorem 2, one applies Theorem 2 to the stabiliser of a single point in {X}. Conversely, to deduce Theorem 2 from Theorem 3, set {X := G/H = \{ gH: g \in G \}} to be the space of left-cosets of {H}, with the obvious left {G}-action; one easily verifies that this action is faithful, transitive, and each non-identity element {g} of {G} fixes at most one left-coset of {H} (basically because it lies in at most one conjugate of {H}). If we let {K} be the elements of {G} that do not fix any point in {X}, plus the identity, then by Theorem 3 {K} is closed under composition; it is also clearly closed under inverse and conjugation, and is hence a normal subgroup of {G}. From construction {K} is the identity plus the complement of all the {|G|/|H|} conjugates of {H}, which are all disjoint except at the identity, so by counting elements we see that

\displaystyle |K| = |G| - \frac{|G|}{|H|}(|H|-1) = |G|/|H|.

As {H} normalises {K} and is disjoint from {K}, we thus see that {KH = H \ltimes K} is all of {G}, giving Theorem 2.

Despite the appealingly concrete and elementary form of Theorem 3, the only known proofs of that theorem (or equivalently, Theorem 2) in its full generality proceed via the machinery of group characters (which one can think of as a version of Fourier analysis for nonabelian groups). On the other hand, once one establishes the basic theory of these characters (reviewed below the fold), the proof of Frobenius’ theorem is very short, which gives quite a striking example of the power of character theory. The proof of Suzuki’s theorem also proceeds via character theory, and is basically a more involved version of the Frobenius argument; again, no character-free proof of Suzuki’s theorem is currently known. (The proofs of Feit-Thompson and CFSG also involve characters, but those proofs also contain many other arguments of much greater complexity than the character-based portions of the proof.)

It seems to me that the above four theorems (Frobenius, Suzuki, Feit-Thompson, and CFSG) provide a ladder of sorts (with exponentially increasing complexity at each step) to the full classification, and that any new approach to the classification might first begin by revisiting the earlier theorems on this ladder and finding new proofs of these results first (in particular, if one had a “robust” proof of Suzuki’s theorem that also gave non-trivial control on “almost CA-groups” – whatever that means – then this might lead to a new route to classifying the finite simple groups of Lie type and bounded rank). But even for the simplest two results on this ladder – Frobenius and Suzuki – it seems remarkably difficult to find any proof that is not essentially the character-based proof. (Even trying to replace character theory by its close cousin, representation theory, doesn’t seem to work unless one gives in to the temptation to take traces everywhere and put the characters back in; it seems that rather than abandon characters altogether, one needs to find some sort of “robust” generalisation of existing character-based methods.) In any case, I am recording here the standard character-based proofs of the theorems of Frobenius and Suzuki below the fold. There is nothing particularly novel here, but I wanted to collect all the relevant material in one place, largely for my own benefit.

Read the rest of this entry »

One of the basic objects of study in combinatorics are finite strings {(a_n)_{n=0}^N} or infinite strings {(a_n)_{n=0}^\infty} of symbols {a_n} from some given alphabet {{\mathcal A}}, which could be either finite or infinite (but which we shall usually take to be compact). For instance, a set {A} of natural numbers can be identified with the infinite string {(1_A(n))_{n=0}^\infty} of {0}s and {1}s formed by the indicator of {A}, e.g. the even numbers can be identified with the string {1010101\ldots} from the alphabet {\{0,1\}}, the multiples of three can be identified with the string {100100100\ldots}, and so forth. One can also consider doubly infinite strings {(a_n)_{n \in {\bf Z}}}, which among other things can be used to describe arbitrary subsets of integers.

On the other hand, the basic object of study in dynamics (and in related fields, such as ergodic theory) is that of a dynamical system {(X,T)}, that is to say a space {X} together with a shift map {T: X \rightarrow X} (which is often assumed to be invertible, although one can certainly study non-invertible dynamical systems as well). One often adds additional structure to this dynamical system, such as topological structure (giving rise topological dynamics), measure-theoretic structure (giving rise to ergodic theory), complex structure (giving rise to complex dynamics), and so forth. A dynamical system gives rise to an action of the natural numbers {{\bf N}} on the space {X} by using the iterates {T^n: X \rightarrow X} of {T} for {n=0,1,2,\ldots}; if {T} is invertible, we can extend this action to an action of the integers {{\bf Z}} on the same space. One can certainly also consider dynamical systems whose underlying group (or semi-group) is something other than {{\bf N}} or {{\bf Z}} (e.g. one can consider continuous dynamical systems in which the evolution group is {{\bf R}}), but we will restrict attention to the classical situation of {{\bf N}} or {{\bf Z}} actions here.

There is a fundamental correspondence principle connecting the study of strings (or subsets of natural numbers or integers) with the study of dynamical systems. In one direction, given a dynamical system {(X,T)}, an observable {c: X \rightarrow {\mathcal A}} taking values in some alphabet {{\mathcal A}}, and some initial datum {x_0 \in X}, we can first form the forward orbit {(T^n x_0)_{n=0}^\infty} of {x_0}, and then observe this orbit using {c} to obtain an infinite string {(c(T^n x_0))_{n=0}^\infty}. If the shift {T} in this system is invertible, one can extend this infinite string into a doubly infinite string {(c(T^n x_0))_{n \in {\bf Z}}}. Thus we see that every quadruplet {(X,T,c,x_0)} consisting of a dynamical system {(X,T)}, an observable {c}, and an initial datum {x_0} creates an infinite string.

Example 1 If {X} is the three-element set {X = {\bf Z}/3{\bf Z}} with the shift map {Tx := x+1}, {c: {\bf Z}/3{\bf Z} \rightarrow \{0,1\}} is the observable that takes the value {1} at the residue class {0 \hbox{ mod } 3} and zero at the other two classes, and one starts with the initial datum {x_0 = 0 \hbox{ mod } 3}, then the observed string {(c(T^n x_0))_{n=0}^\infty} becomes the indicator {100100100\ldots} of the multiples of three.

In the converse direction, every infinite string {(a_n){n=0}^\infty} in some alphabet {{\mathcal A}} arises (in a decidedly non-unique fashion) from a quadruple {(X,T,c,x_0)} in the above fashion. This can be easily seen by the following “universal” construction: take {X} to be the set {X:= {\mathcal A}^{\bf N}} of infinite strings {(b_i)_{n=0}^\infty} in the alphabet {{\mathcal A}}, let {T: X \rightarrow X} be the shift map

\displaystyle T(b_i)_{n=0}^\infty := (b_{i+1})_{n=0}^\infty,

let {c: X \rightarrow {\mathcal A}} be the observable

\displaystyle c((b_i)_{n=0}^\infty) := b_0,

and let {x_0 \in X} be the initial point

\displaystyle x_0 := (a_i)_{n=0}^\infty.

Then one easily sees that the observed string {(c(T^n x_0))_{n=0}^\infty} is nothing more than the original string {(b_i)_{n=0}^\infty}. Note also that this construction can easily be adapted to doubly infinite strings by using {{\mathcal A}^{\bf Z}} instead of {{\mathcal A}^{\bf N}}, at which point the shift map {T} now becomes invertible. An important variant of this construction also attaches an invariant probability measure to {X} that is associated to the limiting density of various sets associated to the string {(a_i)_{n=0}^\infty}, and leads to the Furstenberg correspondence principle, discussed for instance in these previous blog posts. Such principles allow one to rigorously pass back and forth between the combinatorics of strings and the dynamics of systems; for instance, Furstenberg famously used his correspondence principle to demonstrate the equivalence of Szemerédi’s theorem on arithmetic progressions with what is now known as the Furstenberg multiple recurrence theorem in ergodic theory.

In the case when the alphabet {{\mathcal A}} is the binary alphabet {\{0,1\}}, and (for technical reasons related to the infamous non-injectivity {0.999\ldots = 1.00\ldots} of the decimal representation system) the string {(a_n)_{n=0}^\infty} does not end with an infinite string of {1}s, then one can reformulate the above universal construction by taking {X} to be the interval {[0,1)}, {T} to be the doubling map {Tx := 2x \hbox{ mod } 1}, {c: X \rightarrow \{0,1\}} to be the observable that takes the value {1} on {[1/2,1)} and {0} on {[0,1/2)} (that is, {c(x)} is the first binary digit of {x}), and {x_0} is the real number {x_0 := \sum_{n=0}^\infty a_n 2^{-n-1}} (that is, {x_0 = 0.a_0a_1\ldots} in binary).

The above universal construction is very easy to describe, and is well suited for “generic” strings {(a_n)_{n=0}^\infty} that have no further obvious structure to them, but it often leads to dynamical systems that are much larger and more complicated than is actually needed to produce the desired string {(a_n)_{n=0}^\infty}, and also often obscures some of the key dynamical features associated to that sequence. For instance, to generate the indicator {100100100\ldots} of the multiples of three that were mentioned previously, the above universal construction requires an uncountable space {X} and a dynamics which does not obviously reflect the key features of the sequence such as its periodicity. (Using the unit interval model, the dynamics arise from the orbit of {2/7} under the doubling map, which is a rather artificial way to describe the indicator function of the multiples of three.)

A related aesthetic objection to the universal construction is that of the four components {X,T,c,x_0} of the quadruplet {(X,T,c,x_0)} used to generate the sequence {(a_n)_{n=0}^\infty}, three of the components {X,T,c} are completely universal (in that they do not depend at all on the sequence {(a_n)_{n=0}^\infty}), leaving only the initial datum {x_0} to carry all the distinctive features of the original sequence. While there is nothing wrong with this mathematically, from a conceptual point of view it would make sense to make all four components of the quadruplet to be adapted to the sequence, in order to take advantage of the accumulated intuition about various special dynamical systems (and special observables), not just special initial data.

One step in this direction can be made by restricting {X} to the orbit {\{ T^n x_0: n \in {\bf N} \}} of the initial datum {x_0} (actually for technical reasons it is better to restrict to the topological closure {\overline{\{ T^n x_0: n \in {\bf N} \}}} of this orbit, in order to keep {X} compact). For instance, starting with the sequence {100100100\ldots}, the orbit now consists of just three points {100100100\ldots}, {010010010\ldots}, {001001001\ldots}, bringing the system more in line with the example in Example 1. Technically, this is the “optimal” representation of the sequence by a quadruplet {(X,T,c,x_0)}, because any other such representation {(X',T',c',x'_0)} is a factor of this representation (in the sense that there is a unique map {\pi: X \rightarrow X'} with {T' \circ \pi = \pi \circ T}, {c' \circ \pi = c}, and {x'_0 = \pi(x_0)}). However, from a conceptual point of view this representation is still somewhat unsatisfactory, given that the elements of the system {X} are interpreted as infinite strings rather than elements of a more geometrically or algebraically rich object (e.g. points in a circle, torus, or other homogeneous space).

For general sequences {(a_n)_{n=0}^\infty}, locating relevant geometric or algebraic structure in a dynamical system generating that sequence is an important but very difficult task (see e.g. this paper of Host and Kra, which is more or less devoted to precisely this task in the context of working out what component of a dynamical system controls the multiple recurrence behaviour of that system). However, for specific examples of sequences {(a_n)_{n=0}^\infty}, one can use an informal procedure of educated guesswork in order to produce a more natural-looking quadruple {(X,T,c,x_0)} that generates that sequence. This is not a particularly difficult or deep operation, but I found it very helpful in internalising the intuition behind the correspondence principle. Being non-rigorous, this procedure does not seem to be emphasised in most presentations of the correspondence principle, so I thought I would describe it here.

Read the rest of this entry »

The rectification principle in arithmetic combinatorics asserts, roughly speaking, that very small subsets (or, alternatively, small structured subsets) of an additive group or a field of large characteristic can be modeled (for the purposes of arithmetic combinatorics) by subsets of a group or field of zero characteristic, such as the integers {{\bf Z}} or the complex numbers {{\bf C}}. The additive form of this principle is known as the Freiman rectification principle; it has several formulations, going back of course to the original work of Freiman. Here is one formulation as given by Bilu, Lev, and Ruzsa:

Proposition 1 (Additive rectification) Let {A} be a subset of the additive group {{\bf Z}/p{\bf Z}} for some prime {p}, and let {s \geq 1} be an integer. Suppose that {|A| \leq \log_{2s} p}. Then there exists a map {\phi: A \rightarrow A'} into a subset {A'} of the integers which is a Freiman isomorphism of order {s} in the sense that for any {x_1,\ldots,x_s,y_1,\ldots,y_s \in A}, one has

\displaystyle  x_1+\ldots+x_s = y_1+\ldots+y_s

if and only if

\displaystyle  \phi(x_1)+\ldots+\phi(x_s) = \phi(y_1)+\ldots+\phi(y_s).

Furthermore {\phi} is a right-inverse of the obvious projection homomorphism from {{\bf Z}} to {{\bf Z}/p{\bf Z}}.

The original version of the rectification principle allowed the sets involved to be substantially larger in size (cardinality up to a small constant multiple of {p}), but with the additional hypothesis of bounded doubling involved; see the above-mentioned papers, as well as this later paper of Green and Ruzsa, for further discussion.

The proof of Proposition 1 is quite short (see Theorem 3.1 of Bilu-Lev-Ruzsa); the main idea is to use Minkowski’s theorem to find a non-trivial dilate {aA} of {A} that is contained in a small neighbourhood of the origin in {{\bf Z}/p{\bf Z}}, at which point the rectification map {\phi} can be constructed by hand.

Very recently, Codrut Grosu obtained an arithmetic analogue of the above theorem, in which the rectification map {\phi} preserves both additive and multiplicative structure:

Theorem 2 (Arithmetic rectification) Let {A} be a subset of the finite field {{\bf F}_p} for some prime {p \geq 3}, and let {s \geq 1} be an integer. Suppose that {|A| < \log_2 \log_{2s} \log_{2s^2} p - 1}. Then there exists a map {\phi: A \rightarrow A'} into a subset {A'} of the complex numbers which is a Freiman field isomorphism of order {s} in the sense that for any {x_1,\ldots,x_n \in A} and any polynomial {P(x_1,\ldots,x_n)} of degree at most {s} and integer coefficients of magnitude summing to at most {s}, one has

\displaystyle  P(x_1,\ldots,x_n)=0

if and only if

\displaystyle  P(\phi(x_1),\ldots,\phi(x_n))=0.

Note that it is necessary to use an algebraically closed field such as {{\bf C}} for this theorem, in contrast to the integers used in Proposition 1, as {{\bf F}_p} can contain objects such as square roots of {-1} which can only map to {\pm i} in the complex numbers (once {s} is at least {2}).

Using Theorem 2, one can transfer results in arithmetic combinatorics (e.g. sum-product or Szemerédi-Trotter type theorems) regarding finite subsets of {{\bf C}} to analogous results regarding sufficiently small subsets of {{\bf F}_p}; see the paper of Grosu for several examples of this. This should be compared with the paper of Vu, Wood, and Wood, which introduces a converse principle that embeds finite subsets of {{\bf C}} (or more generally, a characteristic zero integral domain) in a Freiman field-isomorphic fashion into finite subsets of {{\bf F}_p} for arbitrarily large primes {p}, allowing one to transfer arithmetic combinatorical facts from the latter setting to the former.

Grosu’s argument uses some quantitative elimination theory, and in particular a quantitative variant of a lemma of Chang that was discussed previously on this blog. In that previous blog post, it was observed that (an ineffective version of) Chang’s theorem could be obtained using only qualitative algebraic geometry (as opposed to quantitative algebraic geometry tools such as elimination theory results with explicit bounds) by means of nonstandard analysis (or, in what amounts to essentially the same thing in this context, the use of ultraproducts). One can then ask whether one can similarly establish an ineffective version of Grosu’s result by nonstandard means. The purpose of this post is to record that this can indeed be done without much difficulty, though the result obtained, being ineffective, is somewhat weaker than that in Theorem 2. More precisely, we obtain

Theorem 3 (Ineffective arithmetic rectification) Let {s, n \geq 1}. Then if {{\bf F}} is a field of characteristic at least {C_{s,n}} for some {C_{s,n}} depending on {s,n}, and {A} is a subset of {{\bf F}} of cardinality {n}, then there exists a map {\phi: A \rightarrow A'} into a subset {A'} of the complex numbers which is a Freiman field isomorphism of order {s}.

Our arguments will not provide any effective bound on the quantity {C_{s,n}} (though one could in principle eventually extract such a bound by deconstructing the proof of Proposition 4 below), making this result weaker than Theorem 2 (save for the minor generalisation that it can handle fields of prime power order as well as fields of prime order as long as the characteristic remains large).

Following the principle that ultraproducts can be used as a bridge to connect quantitative and qualitative results (as discussed in these previous blog posts), we will deduce Theorem 3 from the following (well-known) qualitative version:

Proposition 4 (Baby Lefschetz principle) Let {k} be a field of characteristic zero that is finitely generated over the rationals. Then there is an isomorphism {\phi: k \rightarrow \phi(k)} from {k} to a subfield {\phi(k)} of {{\bf C}}.

This principle (first laid out in an appendix of Lefschetz’s book), among other things, often allows one to use the methods of complex analysis (e.g. Riemann surface theory) to study many other fields of characteristic zero. There are many variants and extensions of this principle; see for instance this MathOverflow post for some discussion of these. I used this baby version of the Lefschetz principle recently in a paper on expanding polynomial maps.

Proof: We give two proofs of this fact, one using transcendence bases and the other using Hilbert’s nullstellensatz.

We begin with the former proof. As {k} is finitely generated over {{\bf Q}}, it has finite transcendence degree, thus one can find algebraically independent elements {x_1,\ldots,x_m} of {k} over {{\bf Q}} such that {k} is a finite extension of {{\bf Q}(x_1,\ldots,x_m)}, and in particular by the primitive element theorem {k} is generated by {{\bf Q}(x_1,\ldots,x_m)} and an element {\alpha} which is algebraic over {{\bf Q}(x_1,\ldots,x_m)}. (Here we use the fact that characteristic zero fields are separable.) If we then define {\phi} by first mapping {x_1,\ldots,x_m} to generic (and thus algebraically independent) complex numbers {z_1,\ldots,z_m}, and then setting {\phi(\alpha)} to be a complex root of of the minimal polynomial for {\alpha} over {{\bf Q}(x_1,\ldots,x_m)} after replacing each {x_i} with the complex number {z_i}, we obtain a field isomorphism {\phi: k \rightarrow \phi(k)} with the required properties.

Now we give the latter proof. Let {x_1,\ldots,x_m} be elements of {k} that generate that field over {{\bf Q}}, but which are not necessarily algebraically independent. Our task is then equivalent to that of finding complex numbers {z_1,\ldots,z_m} with the property that, for any polynomial {P(x_1,\ldots,x_m)} with rational coefficients, one has

\displaystyle  P(x_1,\ldots,x_m) = 0

if and only if

\displaystyle  P(z_1,\ldots,z_m) = 0.

Let {{\mathcal P}} be the collection of all polynomials {P} with rational coefficients with {P(x_1,\ldots,x_m)=0}, and {{\mathcal Q}} be the collection of all polynomials {P} with rational coefficients with {P(x_1,\ldots,x_m) \neq 0}. The set

\displaystyle  S := \{ (z_1,\ldots,z_m) \in {\bf C}^m: P(z_1,\ldots,z_m)=0 \hbox{ for all } P \in {\mathcal P} \}

is the intersection of countably many algebraic sets and is thus also an algebraic set (by the Hilbert basis theorem or the Noetherian property of algebraic sets). If the desired claim failed, then {S} could be covered by the algebraic sets {\{ (z_1,\ldots,z_m) \in {\bf C}^m: Q(z_1,\ldots,z_m) = 0 \}} for {Q \in {\mathcal Q}}. By decomposing into irreducible varieties and observing (e.g. from the Baire category theorem) that a variety of a given dimension over {{\bf C}} cannot be covered by countably many varieties of smaller dimension, we conclude that {S} must in fact be covered by a finite number of such sets, thus

\displaystyle  S \subset \bigcup_{i=1}^n \{ (z_1,\ldots,z_m) \in {\bf C}^m: Q_i(z_1,\ldots,z_m) = 0 \}

for some {Q_1,\ldots,Q_n \in {\bf C}^m}. By the nullstellensatz, we thus have an identity of the form

\displaystyle  (Q_1 \ldots Q_n)^l = P_1 R_1 + \ldots + P_r R_r

for some natural numbers {l,r \geq 1}, polynomials {P_1,\ldots,P_r \in {\mathcal P}}, and polynomials {R_1,\ldots,R_r} with coefficients in {\overline{{\bf Q}}}. In particular, this identity also holds in the algebraic closure {\overline{k}} of {k}. Evaluating this identity at {(x_1,\ldots,x_m)} we see that the right-hand side is zero but the left-hand side is non-zero, a contradiction, and the claim follows. \Box

From Proposition 4 one can now deduce Theorem 3 by a routine ultraproduct argument (the same one used in these previous blog posts). Suppose for contradiction that Theorem 3 fails. Then there exists natural numbers {s,n \geq 1}, a sequence of finite fields {{\bf F}_i} of characteristic at least {i}, and subsets {A_i=\{a_{i,1},\ldots,a_{i,n}\}} of {{\bf F}_i} of cardinality {n} such that for each {i}, there does not exist a Freiman field isomorphism of order {s} from {A_i} to the complex numbers. Now we select a non-principal ultrafilter {\alpha \in \beta {\bf N} \backslash {\bf N}}, and construct the ultraproduct {{\bf F} := \prod_{i \rightarrow \alpha} {\bf F}_i} of the finite fields {{\bf F}_i}. This is again a field (and is a basic example of what is known as a pseudo-finite field); because the characteristic of {{\bf F}_i} goes to infinity as {i \rightarrow \infty}, it is easy to see (using Los’s theorem) that {{\bf F}} has characteristic zero and can thus be viewed as an extension of the rationals {{\bf Q}}.

Now let {a_j := \lim_{i \rightarrow \alpha} a_{i,j}} be the ultralimit of the {a_{i,j}}, so that {A := \{a_1,\ldots,a_n\}} is the ultraproduct of the {A_i}, then {A} is a subset of {{\bf F}} of cardinality {n}. In particular, if {k} is the field generated by {{\bf Q}} and {A}, then {k} is a finitely generated extension of the rationals and thus, by Proposition 4 there is an isomorphism {\phi: k \rightarrow \phi(k)} from {k} to a subfield {\phi(k)} of the complex numbers. In particular, {\phi(a_1),\ldots,\phi(a_n)} are complex numbers, and for any polynomial {P(x_1,\ldots,x_n)} with integer coefficients, one has

\displaystyle  P(a_1,\ldots,a_n) = 0

if and only if

\displaystyle  P(\phi(a_1),\ldots,\phi(a_n)) = 0.

By Los’s theorem, we then conclude that for all {i} sufficiently close to {\alpha}, one has for all polynomials {P(x_1,\ldots,x_n)} of degree at most {s} and whose coefficients are integers whose magnitude sums up to {s}, one has

\displaystyle  P(a_{i,1},\ldots,a_{i,n}) = 0

if and only if

\displaystyle  P(\phi(a_1),\ldots,\phi(a_n)) = 0.

But this gives a Freiman field isomorphism of order {s} between {A_i} and {\phi(A)}, contradicting the construction of {A_i}, and Theorem 3 follows.

The following result is due independently to Furstenberg and to Sarkozy:

Theorem 1 (Furstenberg-Sarkozy theorem) Let {\delta > 0}, and suppose that {N} is sufficiently large depending on {\delta}. Then every subset {A} of {[N] := \{1,\ldots,N\}} of density {|A|/N} at least {\delta} contains a pair {n, n+r^2} for some natural numbers {n, r} with {r \neq 0}.

This theorem is of course similar in spirit to results such as Roth’s theorem or Szemerédi’s theorem, in which the pattern {n,n+r^2} is replaced by {n,n+r,n+2r} or {n,n+r,\ldots,n+(k-1)r} for some fixed {k} respectively. There are by now many proofs of this theorem (see this recent paper of Lyall for a survey), but most proofs involve some form of Fourier analysis (or spectral theory). This may be compared with the standard proof of Roth’s theorem, which combines some Fourier analysis with what is now known as the density increment argument.

A few years ago, Ben Green, Tamar Ziegler, and myself observed that it is possible to prove the Furstenberg-Sarkozy theorem by just using the Cauchy-Schwarz inequality (or van der Corput lemma) and the density increment argument, removing all invocations of Fourier analysis, and instead relying on Cauchy-Schwarz to linearise the quadratic shift {r^2}. As such, this theorem can be considered as even more elementary than Roth’s theorem (and its proof can be viewed as a toy model for the proof of Roth’s theorem). We ended up not doing too much with this observation, so decided to share it here.

The first step is to use the density increment argument that goes back to Roth. For any {\delta > 0}, let {P(\delta)} denote the assertion that for {N} sufficiently large, all sets {A \subset [N]} of density at least {\delta} contain a pair {n,n+r^2} with {r} non-zero. Note that {P(\delta)} is vacuously true for {\delta > 1}. We will show that for any {0 < \delta_0 \leq 1}, one has the implication

\displaystyle  P(\delta_0 + c \delta_0^3) \implies P(\delta_0) \ \ \ \ \ (1)

for some absolute constant {c>0}. This implies that {P(\delta)} is true for any {\delta>0} (as can be seen by considering the infimum of all {\delta>0} for which {P(\delta)} holds), which gives Theorem 1.

It remains to establish the implication (1). Suppose for sake of contradiction that we can find {0 < \delta_0 \leq 1} for which {P(\delta_0+c\delta^3_0)} holds (for some sufficiently small absolute constant {c>0}), but {P(\delta_0)} fails. Thus, we can find arbitrarily large {N}, and subsets {A} of {[N]} of density at least {\delta_0}, such that {A} contains no patterns of the form {n,n+r^2} with {r} non-zero. In particular, we have

\displaystyle  \mathop{\bf E}_{n \in [N]} \mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{h \in [N^{1/100}]} 1_A(n) 1_A(n+(r+h)^2) = 0.

(The exact ranges of {r} and {h} are not too important here, and could be replaced by various other small powers of {N} if desired.)

Let {\delta := |A|/N} be the density of {A}, so that {\delta_0 \leq \delta \leq 1}. Observe that

\displaystyle  \mathop{\bf E}_{n \in [N]} \mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{h \in [N^{1/100}]} 1_A(n) \delta 1_{[N]}(n+(r+h)^2) = \delta^2 + O(N^{-1/3})

\displaystyle  \mathop{\bf E}_{n \in [N]} \mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{h \in [N^{1/100}]} \delta 1_{[N]}(n) \delta 1_{[N]}(n+(r+h)^2) = \delta^2 + O(N^{-1/3})

and

\displaystyle  \mathop{\bf E}_{n \in [N]} \mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{h \in [N^{1/100}]} \delta 1_{[N]}(n) 1_A(n+(r+h)^2) = \delta^2 + O( N^{-1/3} ).

If we thus set {f := 1_A - \delta 1_{[N]}}, then

\displaystyle  \mathop{\bf E}_{n \in [N]} \mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{h \in [N^{1/100}]} f(n) f(n+(r+h)^2) = -\delta^2 + O( N^{-1/3} ).

In particular, for {N} large enough,

\displaystyle  \mathop{\bf E}_{n \in [N]} |f(n)| \mathop{\bf E}_{r \in [N^{1/3}]} |\mathop{\bf E}_{h \in [N^{1/100}]} f(n+(r+h)^2)| \gg \delta^2.

On the other hand, one easily sees that

\displaystyle  \mathop{\bf E}_{n \in [N]} |f(n)|^2 = O(\delta)

and hence by the Cauchy-Schwarz inequality

\displaystyle  \mathop{\bf E}_{n \in [N]} \mathop{\bf E}_{r \in [N^{1/3}]} |\mathop{\bf E}_{h \in [N^{1/100}]} f(n+(r+h)^2)|^2 \gg \delta^3

which we can rearrange as

\displaystyle  |\mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{h,h' \in [N^{1/100}]} \mathop{\bf E}_{n \in [N]} f(n+(r+h)^2) f(n+(r+h')^2)| \gg \delta^3.

Shifting {n} by {(r+h)^2} we obtain (again for {N} large enough)

\displaystyle  |\mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{h,h' \in [N^{1/100}]} \mathop{\bf E}_{n \in [N]} f(n) f(n+(h'-h)(2r+h'+h))| \gg \delta^3.

In particular, by the pigeonhole principle (and deleting the diagonal case {h=h'}, which we can do for {N} large enough) we can find distinct {h,h' \in [N^{1/100}]} such that

\displaystyle  |\mathop{\bf E}_{r \in [N^{1/3}]} \mathop{\bf E}_{n \in [N]} f(n) f(n+(h'-h)(2r+h'+h))| \gg \delta^3,

so in particular

\displaystyle  \mathop{\bf E}_{n \in [N]} |\mathop{\bf E}_{r \in [N^{1/3}]} f(n+(h'-h)(2r+h'+h))| \gg \delta^3.

If we set {d := 2(h'-h)} and shift {n} by {(h'-h) (h'+h)}, we can simplify this (again for {N} large enough) as

\displaystyle  \mathop{\bf E}_{n \in [N]} |\mathop{\bf E}_{r \in [N^{1/3}]} f(n+dr)| \gg \delta^3. \ \ \ \ \ (2)

On the other hand, since

\displaystyle  \mathop{\bf E}_{n \in [N]} f(n) = 0

we have

\displaystyle  \mathop{\bf E}_{n \in [N]} f(n+dr) = O( N^{-2/3+1/100})

for any {r \in [N^{1/3}]}, and thus

\displaystyle  \mathop{\bf E}_{n \in [N]} \mathop{\bf E}_{r \in [N^{1/3}]} f(n+dr) = O( N^{-2/3+1/100}).

Averaging this with (2) we conclude that

\displaystyle  \mathop{\bf E}_{n \in [N]} \max( \mathop{\bf E}_{r \in [N^{1/3}]} f(n+dr), 0 ) \gg \delta^3.

In particular, by the pigeonhole principle we can find {n \in [N]} such that

\displaystyle  \mathop{\bf E}_{r \in [N^{1/3}]} f(n+dr) \gg \delta^3,

or equivalently {A} has density at least {\delta+c'\delta^3} on the arithmetic progression {\{ n+dr: r \in [N^{1/3}]\}}, which has length {\lfloor N^{1/3}\rfloor } and spacing {d}, for some absolute constant {c'>0}. By partitioning this progression into subprogressions of spacing {d^2} and length {\lfloor N^{1/4}\rfloor} (plus an error set of size {O(N^{1/4})}, we see from the pigeonhole principle that we can find a progression {\{ n' + d^2 r': r' \in [N^{1/4}]\}} of length {\lfloor N^{1/4}\rfloor} and spacing {d^2} on which {A} has density at least {\delta + c\delta^3} (and hence at least {\delta_0+c\delta_0^3}) for some absolute constant {c>0}. If we then apply the induction hypothesis to the set

\displaystyle  A' := \{ r' \in [N^{1/4}]: n' + d^2 r' \in A \}

we conclude (for {N} large enough) that {A'} contains a pair {m, m+s^2} for some natural numbers {m,s} with {s} non-zero. This implies that {(n'+d^2 m), (n'+d^2 m) + (|d|s)^2} lie in {A}, a contradiction, establishing the implication (1).

A more careful analysis of the above argument reveals a more quantitative version of Theorem 1: for {N \geq 100} (say), any subset of {[N]} of density at least {C/(\log\log N)^{1/2}} for some sufficiently large absolute constant {C} contains a pair {n,n+r^2} with {r} non-zero. This is not the best bound known; a (difficult) result of Pintz, Steiger, and Szemeredi allows the density to be as low as {C / (\log N)^{\frac{1}{4} \log\log\log\log N}}. On the other hand, this already improves on the (simpler) Fourier-analytic argument of Green that works for densities at least {C/(\log\log N)^{1/11}} (although the original argument of Sarkozy, which is a little more intricate, works up to {C (\log\log N)^{2/3}/(\log N)^{1/3}}). In the other direction, a construction of Rusza gives a set of density {\frac{1}{65} N^{-0.267}} without any pairs {n,n+r^2}.

Remark 1 A similar argument also applies with {n,n+r^2} replaced by {n,n+r^k} for fixed {k}, because this sort of pattern is preserved by affine dilations {r' \mapsto n'+d^k r'} into arithmetic progressions whose spacing {d^k} is a {k^{th}} power. By re-introducing Fourier analysis, one can also perform an argument of this type for {n,n+d,n+2d} where {d} is the sum of two squares; see the above-mentioned paper of Green for details. However there seems to be some technical difficulty in extending it to patterns of the form {n,n+P(r)} for polynomials {P} that consist of more than a single monomial (and with the normalisation {P(0)=0}, to avoid local obstructions), because one no longer has this preservation property.

The fundamental notions of calculus, namely differentiation and integration, are often viewed as being the quintessential concepts in mathematical analysis, as their standard definitions involve the concept of a limit. However, it is possible to capture most of the essence of these notions by purely algebraic means (almost completely avoiding the use of limits, Riemann sums, and similar devices), which turns out to be useful when trying to generalise these concepts to more abstract situations in which it becomes convenient to permit the underlying number systems involved to be something other than the real or complex numbers, even if this makes many standard analysis constructions unavailable. For instance, the algebraic notion of a derivation often serves as a substitute for the analytic notion of a derivative in such cases, by abstracting out the key algebraic properties of differentiation, namely linearity and the Leibniz rule (also known as the product rule).

Abstract algebraic analogues of integration are less well known, but can still be developed. To motivate such an abstraction, consider the integration functional {I: {\mathcal S}({\bf R} \rightarrow {\bf C}) \rightarrow {\bf C}} from the space {{\mathcal S}({\bf R} \rightarrow {\bf C})} of complex-valued Schwarz functions {f: {\bf R} \rightarrow {\bf C}} to the complex numbers, defined by

\displaystyle  I(f) := \int_{\bf R} f(x)\ dx

where the integration on the right is the usual Lebesgue integral (or improper Riemann integral) from analysis. This functional obeys two obvious algebraic properties. Firstly, it is linear over {{\bf C}}, thus

\displaystyle  I(cf) = c I(f) \ \ \ \ \ (1)

and

\displaystyle  I(f+g) = I(f) + I(g) \ \ \ \ \ (2)

for all {f,g \in {\mathcal S}({\bf R} \rightarrow {\bf C})} and {c \in {\bf C}}. Secondly, it is translation invariant, thus

\displaystyle  I(\tau_h f) = I(f) \ \ \ \ \ (3)

for all {h \in {\bf C}}, where {\tau_h f(x) := f(x-h)} is the translation of {f} by {h}. Motivated by the uniqueness theory of Haar measure, one might expect that these two axioms already uniquely determine {I} after one sets a normalisation, for instance by requiring that

\displaystyle  I( x \mapsto e^{-\pi x^2} ) = 1. \ \ \ \ \ (4)

This is not quite true as stated (one can modify the proof of the Hahn-Banach theorem, after first applying a Fourier transform, to create pathological translation-invariant linear functionals on {{\mathcal S}({\bf R} \rightarrow {\bf C})} that are not multiples of the standard Fourier transform), but if one adds a mild analytical axiom, such as continuity of {I} (using the usual Schwartz topology on {{\mathcal S}({\bf R} \rightarrow {\bf C})}), then the above axioms are enough to uniquely pin down the notion of integration. Indeed, if {I: {\mathcal S}({\bf R} \rightarrow {\bf C}) \rightarrow {\bf C}} is a continuous linear functional that is translation invariant, then from the linearity and translation invariance axioms one has

\displaystyle  I( \frac{\tau_h f - f}{h} ) = 0

for all {f \in {\mathcal S}({\bf R} \rightarrow {\bf C})} and non-zero reals {h}. If {f} is Schwartz, then as {h \rightarrow 0}, one can verify that the Newton quotients {\frac{\tau_h f - f}{h}} converge in the Schwartz topology to the derivative {f'} of {f}, so by the continuity axiom one has

\displaystyle  I(f') = 0.

Next, note that any Schwartz function of integral zero has an antiderivative which is also Schwartz, and so {I} annihilates all zero-integral Schwartz functions, and thus must be a scalar multiple of the usual integration functional. Using the normalisation (4), we see that {I} must therefore be the usual integration functional, giving the claimed uniqueness.

Motivated by the above discussion, we can define the notion of an abstract integration functional {I: X \rightarrow R} taking values in some vector space {R}, and applied to inputs {f} in some other vector space {X} that enjoys a linear action {h \mapsto \tau_h} (the “translation action”) of some group {V}, as being a functional which is both linear and translation invariant, thus one has the axioms (1), (2), (3) for all {f,g \in X}, scalars {c}, and {h \in V}. The previous discussion then considered the special case when {R = {\bf C}}, {X = {\mathcal S}({\bf R} \rightarrow {\bf C})}, {V = {\bf R}}, and {\tau} was the usual translation action.

Once we have performed this abstraction, we can now present analogues of classical integration which bear very little analytic resemblance to the classical concept, but which still have much of the algebraic structure of integration. Consider for instance the situation in which we keep the complex range {R = {\bf C}}, the translation group {V = {\bf R}}, and the usual translation action {h \mapsto \tau_h}, but we replace the space {{\mathcal S}({\bf R} \rightarrow {\bf C})} of Schwartz functions by the space {Poly_{\leq d}({\bf R} \rightarrow {\bf C})} of polynomials {x \mapsto a_0 + a_1 x + \ldots + a_d x^d} of degree at most {d} with complex coefficients, where {d} is a fixed natural number; note that this space is translation invariant, so it makes sense to talk about an abstract integration functional {I: Poly_{\leq d}({\bf R} \rightarrow {\bf C}) \rightarrow {\bf C}}. Of course, one cannot apply traditional integration concepts to non-zero polynomials, as they are not absolutely integrable. But one can repeat the previous arguments to show that any abstract integration functional must annihilate derivatives of polynomials of degree at most {d}:

\displaystyle  I(f') = 0 \hbox{ for all } f \in Poly_{\leq d}({\bf R} \rightarrow {\bf C}). \ \ \ \ \ (5)

Clearly, every polynomial of degree at most {d-1} is thus annihilated by {I}, which makes {I} a scalar multiple of the functional that extracts the top coefficient {a_d} of a polynomial, thus if one sets a normalisation

\displaystyle  I( x \mapsto x^d ) = c

for some constant {c}, then one has

\displaystyle  I( x \mapsto a_0 + a_1 x + \ldots + a_d x^d ) = c a_d \ \ \ \ \ (6)

for any polynomial {x \mapsto a_0 + a_1 x + \ldots + a_d x^d}. So we see that up to a normalising constant, the operation of extracting the top order coefficient of a polynomial of fixed degree serves as the analogue of integration. In particular, despite the fact that integration is supposed to be the “opposite” of differentiation (as indicated for instance by (5)), we see in this case that integration is basically ({d}-fold) differentiation; indeed, compare (6) with the identity

\displaystyle  (\frac{d}{dx})^d ( a_0 + a_1 x + \ldots + a_d x^d ) = d! a_d.

In particular, we see, in contrast to the usual Lebesgue integral, the integration functional (6) can be localised to an arbitrary location: one only needs to know the germ of the polynomial {x \mapsto a_0 + a_1 x + \ldots + a_d x^d} at a single point {x_0} in order to determine the value of the functional (6). This localisation property may initially seem at odds with the translation invariance, but the two can be reconciled thanks to the extremely rigid nature of the class {Poly_{\leq d}({\bf R} \rightarrow {\bf C})}, in contrast to the Schwartz class {{\mathcal S}({\bf R} \rightarrow {\bf C})} which admits bump functions and so can generate local phenomena that can only be detected in small regions of the underlying spatial domain, and which therefore forces any translation-invariant integration functional on such function classes to measure the function at every single point in space.

The reversal of the relationship between integration and differentiation is also reflected in the fact that the abstract integration operation on polynomials interacts with the scaling operation {\delta_\lambda f(x) := f(x/\lambda)} in essentially the opposite way from the classical integration operation. Indeed, for classical integration on {{\bf R}^d}, one has

\displaystyle  \int_{{\bf R}^d} f(x/\lambda)\ dx = \lambda^d \int f(x)\ dx

for Schwartz functions {f \in {\mathcal S}({\bf R}^d \rightarrow {\bf C})}, and so in this case the integration functional {I(f) := \int_{{\bf R}^d} f(x)\ dx} obeys the scaling law

\displaystyle  I( \delta_\lambda f ) = \lambda^d I(f).

In contrast, the abstract integration operation defined in (6) obeys the opposite scaling law

\displaystyle  I( \delta_\lambda f ) = \lambda^{-d} I(f). \ \ \ \ \ (7)

Remark 1 One way to interpret what is going on is to view the integration operation (6) as a renormalised version of integration. A polynomial {x \mapsto a_0 + a_1 + \ldots + a_d x^d} is, in general, not absolutely integrable, and the partial integrals

\displaystyle  \int_0^N a_0 + a_1 + \ldots + a_d x^d\ dx

diverge as {N \rightarrow \infty}. But if one renormalises these integrals by the factor {\frac{1}{N^{d+1}}}, then one recovers convergence,

\displaystyle  \lim_{N \rightarrow \infty} \frac{1}{N^{d+1}} \int_0^N a_0 + a_1 + \ldots + a_d x^d\ dx = \frac{1}{d+1} a_d

thus giving an interpretation of (6) as a renormalised classical integral, with the renormalisation being responsible for the unusual scaling relationship in (7). However, this interpretation is a little artificial, and it seems that it is best to view functionals such as (6) from an abstract algebraic perspective, rather than to try to force an analytic interpretation on them.

Now we return to the classical Lebesgue integral

\displaystyle  I(f) := \int_{\bf R} f(x)\ dx. \ \ \ \ \ (8)

As noted earlier, this integration functional has a translation invariance associated to translations along the real line {{\bf R}}, as well as a dilation invariance by real dilation parameters {\lambda>0}. However, if we refine the class {{\mathcal S}({\bf R} \rightarrow {\bf C})} of functions somewhat, we can obtain a stronger family of invariances, in which we allow complex translations and dilations. More precisely, let {\mathcal{SE}({\bf C} \rightarrow {\bf C})} denote the space of all functions {f: {\bf C} \rightarrow {\bf C}} which are entire (or equivalently, are given by a Taylor series with an infinite radius of convergence around the origin) and also admit rapid decay in a sectorial neighbourhood of the real line, or more precisely there exists an {\epsilon>0} such that for every {A > 0} there exists {C_A > 0} such that one has the bound

\displaystyle  |f(z)| \leq C_A (1+|z|)^{-A}

whenever {|\hbox{Im}(z)| \leq A + \epsilon |\hbox{Re}(z)|}. For want of a better name, we shall call elements of this space Schwartz entire functions. This is clearly a complex vector space. A typical example of a Schwartz entire function are the complex gaussians

\displaystyle  f(z) := e^{-\pi (az^2 + 2bz + c)}

where {a,b,c} are complex numbers with {\hbox{Re}(a) > 0}. From the Cauchy integral formula (and its derivatives) we see that if {f} lies in {\mathcal{SE}({\bf C} \rightarrow {\bf C})}, then the restriction of {f} to the real line lies in {{\mathcal S}({\bf R} \rightarrow {\bf C})}; conversely, from analytic continuation we see that every function in {{\mathcal S}({\bf R} \rightarrow {\bf C})} has at most one extension in {\mathcal{SE}({\bf C} \rightarrow {\bf C})}. Thus one can identify {\mathcal{SE}({\bf C} \rightarrow {\bf C})} with a subspace of {{\mathcal S}({\bf R} \rightarrow {\bf C})}, and in particular the integration functional (8) is inherited by {\mathcal{SE}({\bf C} \rightarrow {\bf C})}, and by abuse of notation we denote the resulting functional {I: \mathcal{SE}({\bf C} \rightarrow {\bf C}) \rightarrow {\bf C}} as {I} also. Note, in analogy with the situation with polynomials, that this abstract integration functional is somewhat localised; one only needs to evaluate the function {f} on the real line, rather than the entire complex plane, in order to compute {I(f)}. This is consistent with the rigid nature of Schwartz entire functions, as one can uniquely recover the entire function from its values on the real line by analytic continuation.

Of course, the functional {I: \mathcal{SE}({\bf C} \rightarrow {\bf C}) \rightarrow {\bf C}} remains translation invariant with respect to real translation:

\displaystyle  I(\tau_h f) = I(f) \hbox{ for all } h \in {\bf R}.

However, thanks to contour shifting, we now also have translation invariance with respect to complex translation:

\displaystyle  I(\tau_h f) = I(f) \hbox{ for all } h \in {\bf C},

where of course we continue to define the translation operator {\tau_h} for complex {h} by the usual formula {\tau_h f(x) := f(x-h)}. In a similar vein, we also have the scaling law

\displaystyle  I(\delta_\lambda f) = \lambda I(f)

for any {f \in \mathcal{SE}({\bf C} \rightarrow {\bf C})}, if {\lambda} is a complex number sufficiently close to {1} (where “sufficiently close” depends on {f}, and more precisely depends on the sectoral aperture parameter {\epsilon} associated to {f}); again, one can verify that {\delta_\lambda f} lies in {\mathcal{SE}({\bf C} \rightarrow {\bf C})} for {\lambda} sufficiently close to {1}. These invariances (which relocalise the integration functional {I} onto other contours than the real line {{\bf R}}) are very useful for computing integrals, and in particular for computing gaussian integrals. For instance, the complex translation invariance tells us (after shifting by {b/a}) that

\displaystyle  I( z \mapsto e^{-\pi (az^2 + 2bz + c) } ) = e^{-\pi (c-b^2/a)} I( z \mapsto e^{-\pi a z^2} )

when {a,b,c \in {\bf C}} with {\hbox{Re}(a) > 0}, and then an application of the complex scaling law (and a continuity argument, observing that there is a compact path connecting {a} to {1} in the right half plane) gives

\displaystyle  I( z \mapsto e^{-\pi (az^2 + 2bz + c) } ) = a^{-1/2} e^{-\pi (c-b^2/a)} I( z \mapsto e^{-\pi z^2} )

using the branch of {a^{-1/2}} on the right half-plane for which {1^{-1/2} = 1}. Using the normalisation (4) we thus have

\displaystyle  I( z \mapsto e^{-\pi (az^2 + 2bz + c) } ) = a^{-1/2} e^{-\pi (c-b^2/a)}

giving the usual gaussian integral formula

\displaystyle  \int_{\bf R} e^{-\pi (ax^2 + 2bx + c)}\ dx = a^{-1/2} e^{-\pi (c-b^2/a)}. \ \ \ \ \ (9)

This is a basic illustration of the power that a large symmetry group (in this case, the complex homothety group) can bring to bear on the task of computing integrals.

One can extend this sort of analysis to higher dimensions. For any natural number {n \geq 1}, let {\mathcal{SE}({\bf C}^n \rightarrow {\bf C})} denote the space of all functions {f: {\bf C}^n \rightarrow {\bf C}} which is jointly entire in the sense that {f(z_1,\ldots,z_n)} can be expressed as a Taylor series in {z_1,\ldots,z_n} which is absolutely convergent for all choices of {z_1,\ldots,z_n}, and such that there exists an {\epsilon > 0} such that for any {A>0} there is {C_A>0} for which one has the bound

\displaystyle  |f(z)| \leq C_A (1+|z|)^{-A}

whenever {|\hbox{Im}(z_j)| \leq A + \epsilon |\hbox{Re}(z_j)|} for all {1 \leq j \leq n}, where {z = \begin{pmatrix} z_1 \\ \vdots \\ z_n \end{pmatrix}} and {|z| := (|z_1|^2+\ldots+|z_n|^2)^{1/2}}. Again, we call such functions Schwartz entire functions; a typical example is the function

\displaystyle  f(z) := e^{-\pi (z^T A z + 2b^T z + c)}

where {A} is an {n \times n} complex symmetric matrix with positive definite real part, {b} is a vector in {{\bf C}^n}, and {c} is a complex number. We can then define an abstract integration functional {I: \mathcal{SE}({\bf C}^n \rightarrow {\bf C}) \rightarrow {\bf C}} by integration on the real slice {{\bf R}^n}:

\displaystyle  I(f) := \int_{{\bf R}^n} f(x)\ dx

where {dx} is the usual Lebesgue measure on {{\bf R}^n}. By contour shifting in each of the {n} variables {z_1,\ldots,z_n} separately, we see that {I} is invariant with respect to complex translations of each of the {z_j} variables, and is thus invariant under translating the joint variable {z} by {{\bf C}^n}. One can also verify the scaling law

\displaystyle  I(\delta_A f) = \hbox{det}(A) I(f)

for {n \times n} complex matrices {A} sufficiently close to the origin, where {\delta_A f(z) := f(A^{-1} z)}. This can be seen for shear transformations {A} by Fubini’s theorem and the aforementioned translation invariance, while for diagonal transformations near the origin this can be seen from {n} applications of one-dimensional scaling law, and the general case then follows by composition. Among other things, these laws then easily lead to the higher-dimensional generalisation

\displaystyle  \int_{{\bf R}^n} e^{-\pi (x^T A x + 2 b^T x + c)}\ dx = \hbox{det}(A)^{-1/2} e^{-\pi (c-b^T A^{-1} b)} \ \ \ \ \ (10)

whenever {A} is a complex symmetric matrix with positive definite real part, {b} is a vector in {{\bf C}^n}, and {c} is a complex number, basically by repeating the one-dimensional argument sketched earlier. Here, we choose the branch of {\hbox{det}(A)^{-1/2}} for all matrices {A} in the indicated class for which {\hbox{det}(1)^{-1/2} = 1}.

Now we turn to an integration functional suitable for computing complex gaussian integrals such as

\displaystyle  \int_{{\bf C}^n} e^{-2\pi (z^\dagger A z + b^\dagger z + z^\dagger \tilde b + c)}\ dz d\overline{z}, \ \ \ \ \ (11)

where {z} is now a complex variable

\displaystyle  z = \begin{pmatrix} z_1 \\ \vdots \\ z_n \end{pmatrix},

{z^\dagger} is the adjoint

\displaystyle  z^\dagger := (\overline{z_1},\ldots, \overline{z_n}),

{A} is a complex {n \times n} matrix with positive definite Hermitian part, {b, \tilde b} are column vectors in {{\bf C}^n}, {c} is a complex number, and {dz d\overline{z} = \prod_{j=1}^n 2 d\hbox{Re}(z_j) d\hbox{Im}(z_j)} is {2^n} times Lebesgue measure on {{\bf C}^n}. (The factors of two here turn out to be a natural normalisation, but they can be ignored on a first reading.) As we shall see later, such integrals are relevant when performing computations on the Gaussian Unitary Ensemble (GUE) in random matrix theory. Note that the integrand here is not complex analytic due to the presence of the complex conjugates. However, this can be dealt with by the trick of replacing the complex conjugate {\overline{z}} by a variable {z^*} which is formally conjugate to {z}, but which is allowed to vary independently of {z}. More precisely, let {\mathcal{SA}({\bf C}^n \times {\bf C}^n \rightarrow {\bf C})} be the space of all functions {f: (z,z^*) \mapsto f(z,z^*)} of two independent {n}-tuples

\displaystyle  z = \begin{pmatrix} z_1 \\ \vdots \\ z_n \end{pmatrix}, z^* = \begin{pmatrix} z_1^* \\ \vdots \\ z_n^* \end{pmatrix}

of complex variables, which is jointly entire in all {2n} variables (in the sense defined previously, i.e. there is a joint Taylor series that is absolutely convergent for all independent choices of {z, z^* \in {\bf C}^n}), and such that there is an {\epsilon>0} such that for every {A>0} there is {C_A>0} such that one has the bound

\displaystyle  |f(z,z^*)| \leq C_A (1 + |z|)^{-A}

whenever {|z^* - \overline{z}| \leq A + \epsilon |z|}. We will call such functions Schwartz analytic. Note that the integrand in (11) is Schwartz analytic when {A} has positive definite Hermitian part, if we reinterpret {z^\dagger} as the transpose of {z^*} rather than as the adjoint of {z} in order to make the integrand entire in {z} and {z^*}. We can then define an abstract integration functional {I: \mathcal{SA}({\bf C}^n \times {\bf C}^n \rightarrow {\bf C}) \rightarrow {\bf C}} by the formula

\displaystyle  I(f) := \int_{{\bf C}^n} f(z,\overline{z})\ dz d\overline{z}, \ \ \ \ \ (12)

thus {I} can be localised to the slice {\{ (z,\overline{z}): z \in {\bf C}^n\}} of {{\bf C}^n \times {\bf C}^n} (though, as with previous functionals, one can use contour shifting to relocalise {I} to other slices also.) One can also write this integral as

\displaystyle  I(f) = 2^n \int_{{\bf R}^n \times {\bf R}^n} f(x+iy, x-iy)\ dx dy

and note that the integrand here is a Schwartz entire function on {{\bf C}^n \times {\bf C}^n}, thus linking the Schwartz analytic integral with the Schwartz entire integral. Using this connection, one can verify that this functional {I} is invariant with respect to translating {z} and {z^*} by independent shifts in {{\bf C}^n} (thus giving a {{\bf C}^n \times {\bf C}^n} translation symmetry), and one also has the independent dilation symmetry

\displaystyle  I(\delta_{A,B} f) = \hbox{det}(A) \hbox{det}(B) I(f)

for {n \times n} complex matrices {A,B} that are sufficiently close to the identity, where {\delta_{A,B} f(z,z^*) := f(A^{-1} z, B^{-1} z^*)}. Arguing as before, we can then compute (11) as

\displaystyle  \int_{{\bf C}^n} e^{-2\pi (z^\dagger A z + b^\dagger z + z^\dagger \tilde b + c)}\ dz d\overline{z} = \hbox{det}(A)^{-1} e^{-2\pi (c - b^\dagger A^{-1} \tilde b)}. \ \ \ \ \ (13)

In particular, this gives an integral representation for the determinant-reciprocal {\hbox{det}(A)^{-1}} of a complex {n \times n} matrix with positive definite Hermitian part, in terms of gaussian expressions in which {A} only appears linearly in the exponential:

\displaystyle  \hbox{det}(A)^{-1} = \int_{{\bf C}^n} e^{-2\pi z^\dagger A z}\ dz d\overline{z}.

This formula is then convenient for computing statistics such as

\displaystyle  \mathop{\bf E} \hbox{det}(W_n-E-i\eta)^{-1}

for random matrices {W_n} drawn from the Gaussian Unitary Ensemble (GUE), and some choice of spectral parameter {E+i\eta} with {\eta>0}; we review this computation later in this post. By the trick of matrix differentiation of the determinant (as reviewed in this recent blog post), one can also use this method to compute matrix-valued statistics such as

\displaystyle  \mathop{\bf E} \hbox{det}(W_n-E-i\eta)^{-1} (W_n-E-i\eta)^{-1}.

However, if one restricts attention to classical integrals over real or complex (and in particular, commuting or bosonic) variables, it does not seem possible to easily eradicate the negative determinant factors in such calculations, which is unfortunate because many statistics of interest in random matrix theory, such as the expected Stieltjes transform

\displaystyle  \mathop{\bf E} \frac{1}{n} \hbox{tr} (W_n-E-i\eta)^{-1},

which is the Stieltjes transform of the density of states. However, it turns out (as I learned recently from Peter Sarnak and Tom Spencer) that it is possible to cancel out these negative determinant factors by balancing the bosonic gaussian integrals with an equal number of fermionic gaussian integrals, in which one integrates over a family of anticommuting variables. These fermionic integrals are closer in spirit to the polynomial integral (6) than to Lebesgue type integrals, and in particular obey a scaling law which is inverse to the Lebesgue scaling (in particular, a linear change of fermionic variables {\zeta \mapsto A \zeta} ends up transforming a fermionic integral by {\hbox{det}(A)} rather than {\hbox{det}(A)^{-1}}), which conveniently cancels out the reciprocal determinants in the previous calculations. Furthermore, one can combine the bosonic and fermionic integrals into a unified integration concept, known as the Berezin integral (or Grassmann integral), in which one integrates functions of supervectors (vectors with both bosonic and fermionic components), and is of particular importance in the theory of supersymmetry in physics. (The prefix “super” in physics means, roughly speaking, that the object or concept that the prefix is attached to contains both bosonic and fermionic aspects.) When one applies this unified integration concept to gaussians, this can lead to quite compact and efficient calculations (provided that one is willing to work with “super”-analogues of various concepts in classical linear algebra, such as the supertrace or superdeterminant).

Abstract integrals of the flavour of (6) arose in quantum field theory, when physicists sought to formally compute integrals of the form

\displaystyle  \int F( x_1, \ldots, x_n, \xi_1, \ldots, \xi_m )\ dx_1 \ldots dx_n d\xi_1 \ldots d\xi_m \ \ \ \ \ (14)

where {x_1,\ldots,x_n} are familiar commuting (or bosonic) variables (which, in particular, can often be localised to be scalar variables taking values in {{\bf R}} or {{\bf C}}), while {\xi_1,\ldots,\xi_m} were more exotic anticommuting (or fermionic) variables, taking values in some vector space of fermions. (As we shall see shortly, one can formalise these concepts by working in a supercommutative algebra.) The integrand {F(x_1,\ldots,x_n,\xi_1,\ldots,\xi_m)} was a formally analytic function of {x_1,\ldots,x_n,\xi_1,\ldots,\xi_m}, in that it could be expanded as a (formal, noncommutative) power series in the variables {x_1,\ldots,x_n,\xi_1,\ldots,\xi_m}. For functions {F(x_1,\ldots,x_n)} that depend only on bosonic variables, it is certainly possible for such analytic functions to be in the Schwartz class and thus fall under the scope of the classical integral, as discussed previously. However, functions {F(\xi_1,\ldots,\xi_m)} that depend on fermionic variables {\xi_1,\ldots,\xi_m} behave rather differently. Indeed, a fermonic variable {\xi} must anticommute with itself, so that {\xi^2 = 0}. In particular, any power series in {\xi} terminates after the linear term in {\xi}, so that a function {F(\xi)} can only be analytic in {\xi} if it is a polynomial of degree at most {1} in {\xi}; more generally, an analytic function {F(\xi_1,\ldots,\xi_m)} of {m} fermionic variables {\xi_1,\ldots,\xi_m} must be a polynomial of degree at most {m}, and an analytic function {F(x_1,\ldots,x_n,\xi_1,\ldots,\xi_m)} of {n} bosonic and {m} fermionic variables can be Schwartz in the bosonic variables but will be polynomial in the fermonic variables. As such, to interpret the integral (14), one can use classical (Lebesgue) integration (or the variants discussed above for integrating Schwartz entire or Schwartz analytic functions) for the bosonic variables, but must use abstract integrals such as (6) for the fermonic variables, leading to the concept of Berezin integration mentioned earlier.

In this post I would like to set out some of the basic algebraic formalism of Berezin integration, particularly with regards to integration of gaussian-type expressions, and then show how this formalism can be used to perform computations involving GUE (for instance, one can compute the density of states of GUE by this machinery without recourse to the theory of orthogonal polynomials). The use of supersymmetric gaussian integrals to analyse ensembles such as GUE appears in the work of Efetov (and was also proposed in the slightly earlier works of Parisi-Sourlas and McKane, with a related approach also appearing in the work of Wegner); the material here is adapted from this survey of Mirlin, as well as the later papers of Disertori-Pinson-Spencer and of Disertori.

Read the rest of this entry »

RSS Google+ feed

  • An error has occurred; the feed is probably down. Try again later.
Follow

Get every new post delivered to your Inbox.

Join 2,425 other followers