This is my final Milliman lecture, in which I talk about the sum-product phenomenon in arithmetic combinatorics, and some selected recent applications of this phenomenon to uniform distribution of exponentials, expander graphs, randomness extractors, and detecting (sieving) almost primes in group orbits, particularly as developed by Bourgain and his co-authors.
In the previous two lectures we had concentrated on additive combinatorics – the study of additive operations and patterns on sets. Now we will look at arithmetic combinatorics – the simultaneous study of additive and multiplicative operations on sets; this is sort of a combinatorial analogue of commutative algebra.

There are many questions to study here, but the most basic is the sum-product problem, which we can state as follows. Let A be a finite non-empty set of elements of a ring R (e.g. finite sets of integers, or elements of a cyclic group {\Bbb Z}/q{\Bbb Z}, or sets of matrices over some ring). Then we can form the sum set

A+A := \{ a + b: a, b \in A \}

and the product set

A \cdot A := \{ a \cdot b: a, b \in A \}

To avoid degeneracies, let us assume that none (or very few) of the elements in A are zero divisors (as this may cause A \cdot A to become very small). Then it is easy to see that A+A and A \cdot A will be at least as large as A itself.

Typically, both of these sets will be much larger than A itself, indeed, if we select A at random, we generically expect A+A and A \cdot A to have cardinality comparable to |A|^2. But when A enjoys additive or multiplicative structure, the sets A+A or A \cdot A can be of size comparable to A. For instance, if A is an arithmetic progression \{a, a+r, a+2r, \ldots, a+(k-1)r\} or an additive subgroup in the ring R (modulo zero divisors, such as 0), then |A+A| \sim |A|. Similarly, if A is a geometric progression \{a, ar, ar^2, \ldots, ar^{k-1} \} or a multiplicative subgroup in the ring R, then |A \cdot A| \sim |A|. And of course, if A is both an additive and a multiplicative subgroup of R (modulo zero divisors), i.e. if A is a subring of R, then |A+A| and |A \cdot A| are both comparable in size to |A|. These examples are robust with respect to small perturbations; for instance, if A is a dense subset of an arithmetic progression or additive subgroup, then it is still the case that A+A is comparable in size to A. There are also slightly more complicated examples of interest, such as generalised arithmetic progressions, but we will not discuss these here.

Now let us work in the ring of integers {\Bbb Z}. This ring has no non-trivial finite additive subgroups or multiplicative subgroups (and it certainly has no non-trivial finite subrings), but it of course has plenty of arithmetic progressions and geometric progressions. But observe that it is rather difficult for a finite set A of integers to resemble both an arithmetic progression and a geometric progression simultaneously (unless A is very small). So one expects at least one of A+A and A \cdot A to be significantly larger than A itself. This claim was made precise by Erdős and Szemerédi, who showed that

\max( |A+A|, |A \cdot A| ) \gg |A|^{1+\varepsilon} (1)

for some absolute constant \varepsilon > 0. The value of this constant as improved steadily over the years; the best result currently is due to Solymosi, who showed that one can take \varepsilon arbitrarily close to 3/11. Erdős and Szemerédi in fact conjectured that one can take \varepsilon arbitrarily close to 1 (i.e. for any finite set of integers A, either the sum set or product set has to be very close to its maximal size of |A|^2), but this conjecture seems out of reach at present. Nevertheless, even just the epsilon improvement over the trivial bound of |A| is actually quite useful. It is the first example of what is now called the sum-product phenomenon: if a finite set A is not close to an actual subring, then either the sum set A+A or the product set A \cdot A must be significantly larger than A itself. One can view (1) as a “robust” version of the assertion that the integers contain no non-trivial finite subrings; (1) is asserting that in fact the integers contain no non-trivial finite sets which even come close to behaving like a subring.

In 1999, Tom Wolff posed the question of whether the sum-product phenomenon held true in finite fields {\Bbb F}_p of prime order (note that such fields have no non-trivial subrings), and in particular whether (1) was true when A \subset {\Bbb F}_p, and A was not close to being all of {\Bbb F}_p, in the sense that |A| \leq p^{1-\delta} for some \delta > 0; of course one would need \varepsilon to depend on \delta. (Actually, Tom only posed the question for |A| \sim p^{1/2}, being motivated by finite field analogues of the Kakeya problem, but the question was clearly of interest for other ranges of A as well.) This question was solved in the affirmative by Bourgain, Katz, and myself (in the range p^\delta \leq |A| \leq p^{1-\delta}) and then by Bourgain, Glibichuk, and Konyagin (in the full range 1 \leq |A| \leq p^{1-\delta}); the result is now known as the sum-product theorem for {\Bbb F}_p (and there have since been several further proofs and refinements of this theorem). The fact that the field has prime order is key; if for instance we were working in a field of order p^2, then by taking A to be the subfield of order p we see that both A+A and A \cdot A have exactly the same size as A. So any proof of the sum-product theorem must use at some point the fact that the field has prime order.

As in the integers, one can view the sum-product theorem as a robust assertion of the obvious statement that the field {\Bbb F}_p contains no non-trivial subrings. So the main difficulty in the proof is to find a proof of this latter fact which is robust enough to generalise to this combinatorial setting. The standard way to classify subrings is to use Lagrange’s theorem that the order of a subgroup divides the order of the whole group, which is proven by partitioning the whole group into cosets of the subgroup, but this argument is very unstable and does not extend to the combinatorial setting. But there are other ways to proceed. The argument of Bourgain, Katz, and myself (which is based on an earlier argument of Edgar and Miller), roughly speaking, proceeds by investigating the “dimension” of {\Bbb F}_p relative to A, or in other words the least number of elements v_1, \ldots, v_d in {\Bbb F}_p such that every element of {\Bbb F} can be expressed in the form a_1 v_1 + \ldots + a_d v_d. Note that the number of such representations is equal to |A|^d. The key observation is that as |{\Bbb F}_p| is prime, it cannot equal |A|^d if d > 1, and so by the pigeonhole principle some element must have more than one representation. One can use this “linear dependence” to reduce the dimension by 1 (assuming that A behaves a lot like a subring), and so can eventually reduce to the d=1 case, which is prohibited by our assumption A < p^{1-\delta}. (The hypothesis |A| > p^{\delta} is needed to ensure that the initial dimension d is bounded, so that the iteration only requires a bounded number of steps.) The argument of Bourgain, Glibichuk, and Konyagin uses a more algebraic method (a variant of the polynomial method of Stepanov), using the basic observation that the number of zeroes of a polynomial (counting multiplicity) is bounded by the degree of that polynomial to obtain upper bounds for various sets (such as the number of parallelograms in A). More recently, a short argument of Garaev proceeds using the simple observation that if A is any non-trivial subset of {\Bbb F}_p, then there must exist a \in A such that a+1 \not \in A; applying this to the “fraction field” Q[A] := \{ (a-b)/(c-d): a,b,c,d \in A, c \neq d \} of A one can conclude that Q[A] does not in fact behave like a field, and hence A does not behave like a ring.

The sum-product phenomenon implies that if a set A \subset {\Bbb F}_p of medium size p^\delta \leq |A| \leq p^{1-\delta} is multiplicatively structured (e.g. it is a geometric progression or a multiplicative subgroup) then it cannot be additively structured: A+A is significantly larger than A. It turns out that with a little bit of extra work, this observation can be iterated: A+A+A+A is even larger than A, and so on and so forth, and in fact one can show that kA = {\Bbb F}_p for some bounded k depending only on \delta, where kA := A + \ldots + A is the k-fold sumset of A. (The key to this iteration essentially lies in the inclusion (kA) \cdot (kA) \subset k^2 (A^2), which is a consequence of the distributive law. The use of this law unfortunately breaks the symmetry between multiplication and addition that one sees in the sum-product estimates.) Thus any multiplicatively structured subset A of {\Bbb F}_p of medium size must eventually additively generate the whole field. As a consequence of this, one can show that A is an additive expander, which roughly speaking means that A+B is spread out on a significantly larger set than B for any medium-sized set B. (In more probabilistic language, if one considered the random walk whose steps were drawn randomly from A, then this walk would converge extremely rapidly to the uniform distribution.) From that observation (and some more combinatorial effort), one can in fact conclude that multiplicatively structured sets must be distributed uniformly in an additive sense; if they concentrated too much in, say, a subinterval of {\Bbb F}_p, then this could be used to contradict the additive expansion property.

Let me note one cute application of this technology, due to Bourgain, to the Diffie-Hellman key exchange protocol and its relatives in cryptography. Suppose we have two people, Alice and Bob, who want to communicate privately and securely, but have never met each other before and can only contact each other via an unsecured network (e.g. the internet, or physical mail), in which anyone can eavesdrop. How can Alice and Bob achieve this?

If one was sending a physical object (e.g. a physical letter) by physical mail (which could be opened by third parties), one could proceed as follows.

  1. Alice places the object in a box, and locks the box with her own padlock, keeping the key. She then mails the locked box to Bob. Anyone who intercepts the box cannot open it, since they don’t have Alice’s key.
  2. Of course, Bob can’t open the box either. But what he can do instead is put his own padlock on the box (keeping the key), and sends the doubly locked box back to Alice.
  3. Alice can’t unlock Bob’s padlock… but she can unlock her own. So she removes her lock, and sends the singly locked box back to Bob.
  4. Bob can unlock his own padlock, and so retreives the object safely. At no point was the object available to any interceptor.

A similar procedure (a slight variant of the Diffie-Hellman protocol, essentially the Massey-Omura cryptosystem) can be used to transmit a digital message g (which one should think of as just being a number) from Alice to Bob over an unsecured network, as follows:

  1. Alice and Bob agree (over the unsecured network) on some large prime p (larger than the maximum size of the message g).
  2. Alice “locks” the message g by raising it to a power a mod p, where Alice generates the “key” a randomly and keeps it secret. She then sends the locked message g^a \hbox{ mod } p to Bob.
  3. Bob can’t decode this message (he doesn’t know a), but he doubly locks the message by raising the message to his own power b, and returns the doubly locked message g^{ab} \hbox{ mod } p back to Alice.
  4. Alice then “unlocks” her part of the message by taking the a^{th} root (which can be done by Cauchy’s theorem) and sends g^b \hbox{ mod } p back to Bob.
  5. Bob then takes the b^{th} root of the message and recovers g.

An eavesdropper (let’s call her Eve) could intercept p, as well as the three “locked” values g^a, g^b, g^{ab} \hbox{ mod } p, but does not directly recover g. Now, it is possible that one could use this information to reconstruct g (indeed, if one could quickly take discrete logarithms, then this would be a fairly easy task) but no feasible algorithm for this is known (if p is large, e.g. 500+ digits); the problem is generally believed to be roughly comparable in difficulty to that of factoring large numbers. But no-one knows how to rigorously prove that the Diffie-Hellman reconstruction problem is hard (e.g. non-polynomial time); indeed, this would imply P \neq NP, since this reconstruction problem is easily seen to be in NP (though it is not believed to be NP-complete).

Using the sum-product technology, Bourgain was at least able to show that the Diffie-Hellman protocol was secure (for sufficiently large p) if Eve was only able to see the high bits of g^a, g^b, g^{ab} \hbox{ mod } p, thus pinning down g^a, g^b, g^{ab} to intervals. The reason for this is that the set \{ (g^a, g^b, g^{ab}) \in {\Bbb F}_p^3: a,b \in {\Bbb Z} \} has a lot of multiplicative structure (indeed, it is a multiplicative subgroup of the ring {\Bbb F}_p^3) and so should be uniformly distributed in an additive sense (by adapting the above sum-product technology to {\Bbb F}_p^3).

Another application of sum-product technology was to build efficient randomness extractors – deterministic algorithms that can create high-quality (very uniform) random bits from several independent low-quality (non-uniform) random sources; such extractors are of importance in computer science and cryptography. Basically, the sum-product estimate implies that if A, B, C \subset {\Bbb F}_p are sets of medium size, then the set A+B \cdot C is significantly larger than A, B, or C. As a consequence, if X, Y, Z are independent random variables in {\Bbb F}_p which are not too narrowly distributed(in particular, they are not deterministic, and thus distributed only on a single value), one can show (with the assistance of some additive combinatorics) that the random variable X+YZ is significantly more uniformly distributed than X, Y, or Z. Iterating this leads to some surprisingly good randomness extractors, as was first observed by Barak, Impagliazzo, and Wigderson.

Another application of the above sum-product technology was to get a product estimate in matrix groups, such as SL_2({\Bbb F}_p). Indeed, Helfgott was able to show that if A was a subset of SL_2({\Bbb F}_p) of medium or small size, and it was not trapped inside a proper subgroup of SL_2({\Bbb F}_p), then A \cdot A \cdot A was significantly larger than A itself. (One needs to work with triple products here instead of double products for a rather trivial reason: if A was the union of a subgroup and some external element, then A \cdot A is still comparable in size to A, but A \cdot A \cdot A will be much larger. This result may not immediately look like a sum-product estimate, because there is no obvious addition, but it is concealed within the matrix multiplication law for SL_2({\Bbb F}_p). The key observation in Helfgott’s argument, which relies crucially on the sum-product estimate, is that if V is a collection of diagonal matrices in SL_2({\Bbb F}_p) of medium size, and g is a non-diagonal matrix element, then the set \hbox{tr}(V g V g^{-1}) is significantly larger than V itself. If one works out explicitly what this trace is, one sees a sum-product type of result emerging. Conversely, if the trace \hbox{tr}(A) of a group-like set A is large, then the conjugacy classes in A are fairly small (since trace is conjugation-invariant), which forces many pairs in A to commute, which creates large sets V of simultaneously commuting (and hence simultaneously diagonalisable) elements, due to the fact (specific to SL_2) that if two elements commute with a third, then they are quite likely to commute with each other. The tension between these two implications is what underlies Helfgott’s results.

The estimate of Helfgott shows that multiplication by medium-size sets in SL_2({\Bbb F}_p) expands rapidly across the group (unless it is trapped in a subgroup). As a consequence of Helfgott’s estimate, Bourgain and Gamburd were able to show that if S was any finite symmetric set of matrices in SL_2({\Bbb Z}) which generated a sufficiently large (or more precisely, Zariski dense) subgroup of SL_2({\Bbb Z}), and S_p was the projection of S to SL_2({\Bbb Z}_p), then the random walk using S_p on SL_2({\Bbb Z}_p) was very rapidly mixing, so that after about O(\log p) steps, the walk was very close to uniform. (The precise statement was that the Cayley graph associated to S_p for each p formed an expander family.) Quite recently, Bourgain, Gamburd, and Sarnak have applied these results (and generalisations thereof) to the problem of detecting (or sieving) almost primes in thin algebraically generated sets. To motivate the problem, we observe that many classical questions in prime number theory can be rephrased as one of detecting prime points (p_1,\ldots,p_d) \in {\mathcal P}^d in algebraic subsets {\mathcal O} of a lattice {\Bbb Z}^d. For instance, the twin prime problem asks whether the line {\mathcal O} = \{ (n,n+2) \in {\Bbb Z}^2 \} contains infinitely many prime points. In general, these problems are very difficult, especially once one considers sets described by polynomials rather than linear functions; even the one-dimensional problem of determining whether the set {\mathcal O} = \{ n^2+1: n \in {\Bbb Z} \} contains infinitely many primes has been open for quite a long time (though it is worth mentioning the celebrated result of Friedlander and Iwaniec that the somewhat larger set {\mathcal O} = \{ n^2 + m^4: n,m \in {\Bbb Z} \} is known to have infinitely many primes).

So prime points are hard to detect. However, by using methods from sieve theory, one can often detect almost prime points in various sets {\mathcal O} – points whose coordinates are the products of only a few primes. For instance, a famous theorem of Chen shows that the line {\mathcal O} = \{ (n,n+2) \in {\Bbb Z}^2 \} contains infinitely many points which are almost prime in the sense that the first coordinate is prime, and the second coordinate is the product of at most two primes. The basic idea of sieve theory is to sift out primes and almost primes by removing all points whose coordinates are divisible by small factors (and then, due to various generalisations of the inclusion-exclusion principle, one has to add back in points which are divisible by multiple small factors, and so forth). See my post on sieve theory and the parity problem for further discussion. In order for sieve theory to work well, one needs to be able to accurately count the size of the original set {\mathcal O} (or more precisely, the size of this set restricted to a ball or a similar object), and also need to count how many points in that set have a certain residue class modulo q, for various values of q. (For instance, to sieve out twin primes or twin almost primes in the interval {1,..,N}, one needs to count how many elements n in that interval are such that n and n+2 are both invertible modulo q (i.e. coprime to q) for various values of q.)

For arbitrary algebraic sets {\mathcal O}, these tasks are very difficult. For instance, even the basic task of determining whether a set {\mathcal O} described by several polynomials is non-empty is essentially Hilbert’s tenth problem, which is undecidable in general. But if the set {\mathcal O} is generated by a group \Lambda acting on {\Bbb Z}^d (in some polynomial fashion), thus {\mathcal O} = \Lambda b for some point b \in {\Bbb Z}^d, then the problems become much more tractable. If the group \Lambda is generated by some finite set S, and we restrict attention to group elements with some given word length, the problem of understanding how {\mathcal O} is distributed modulo q is equivalent to asking how random walks on S of a given length distribute themselves on ({\Bbb Z}/q{\Bbb Z})^d. This latter problem is very close to the problem solved by the mixing results of Bourgain and Gamburd mentioned earlier, which is where the link to sum-product estimates arises from. Indeed, Bourgain, Gamburd, and Sarnak have now shown that rather general classes of algebraic sets generated by subgroups of SL_2({\Bbb Z}) will contain infinitely many almost primes, as long as there are no obvious algebraic obstructions; the methods should hopefully extend to more general groups, such as subgroups of SL_n({\Bbb Z}).

[Update, Dec 7: terminology and typos corrected.]