You are currently browsing the tag archive for the ‘additive combinatorics’ tag.

In 1964, Kemperman established the following result:

Theorem 1 Let ${G}$ be a compact connected group, with a Haar probability measure ${\mu}$. Let ${A, B}$ be compact subsets of ${G}$. Then

$\displaystyle \mu(AB) \geq \min( \mu(A) + \mu(B), 1 ).$

Remark 1 The estimate is sharp, as can be seen by considering the case when ${G}$ is a unit circle, and ${A, B}$ are arcs; similarly if ${G}$ is any compact connected group that projects onto the circle. The connectedness hypothesis is essential, as can be seen by considering what happens if ${A}$ and ${B}$ are a non-trivial open subgroup of ${G}$. For locally compact connected groups which are unimodular but not compact, there is an analogous statement, but with ${\mu}$ now a Haar measure instead of a Haar probability measure, and the right-hand side ${\min(\mu(A)+\mu(B),1)}$ replaced simply by ${\mu(A)+\mu(B)}$. The case when ${G}$ is a torus is due to Macbeath, and the case when ${G}$ is a circle is due to Raikov. The theorem is closely related to the Cauchy-Davenport inequality; indeed, it is not difficult to use that inequality to establish the circle case, and the circle case can be used to deduce the torus case by considering increasingly dense circle subgroups of the torus (alternatively, one can also use Kneser’s theorem).

By inner regularity, the hypothesis that ${A,B}$ are compact can be replaced with Borel measurability, so long as one then adds the additional hypothesis that ${A+B}$ is also Borel measurable.

A short proof of Kemperman’s theorem was given by Ruzsa. In this post I wanted to record how this argument can be used to establish the following more “robust” version of Kemperman’s theorem, which not only lower bounds ${AB}$, but gives many elements of ${AB}$ some multiplicity:

Theorem 2 Let ${G}$ be a compact connected group, with a Haar probability measure ${\mu}$. Let ${A, B}$ be compact subsets of ${G}$. Then for any ${0 \leq t \leq \min(\mu(A),\mu(B))}$, one has

$\displaystyle \int_G \min(1_A*1_B, t)\ d\mu \geq t \min(\mu(A)+\mu(B) - t,1). \ \ \ \ \ (1)$

Indeed, Theorem 1 can be deduced from Theorem 2 by dividing (1) by ${t}$ and then taking limits as ${t \rightarrow 0}$. The bound in (1) is sharp, as can again be seen by considering the case when ${A,B}$ are arcs in a circle. The analogous claim for cyclic groups for prime order was established by Pollard, and for general abelian groups by Green and Ruzsa.

Let us now prove Theorem 2. It uses a submodularity argument related to one discussed in this previous post. We fix ${B}$ and ${t}$ with ${0 \leq t \leq \mu(B)}$, and define the quantity

$\displaystyle c(A) := \int_G \min(1_A*1_B, t)\ d\mu - t (\mu(A)+\mu(B)-t).$

for any compact set ${A}$. Our task is to establish that ${c(A) \geq 0}$ whenever ${t \leq \mu(A) \leq 1-\mu(B)+t}$.

We first verify the extreme cases. If ${\mu(A) = t}$, then ${1_A*1_B \leq t}$, and so ${c(A)=0}$ in this case (since ${\int_G 1_A*1_B = \mu(A)\mu(B) = t \mu(B)}$). At the other extreme, if ${\mu(A) = 1-\mu(B)+t}$, then from the inclusion-exclusion principle we see that ${1_A * 1_B \geq t}$, and so again ${c(A)=0}$ in this case.

To handle the intermediate regime when ${\mu(A)}$ lies between ${t}$ and ${1-\mu(B)+t}$, we rely on the submodularity inequality

$\displaystyle c(A_1) + c(A_2) \geq c(A_1 \cap A_2) + c(A_1 \cup A_2) \ \ \ \ \ (2)$

for arbitrary compact ${A_1,A_2}$. This inequality comes from the obvious pointwise identity

$\displaystyle 1_{A_1} + 1_{A_2} = 1_{A_1 \cap A_2} + 1_{A_1 \cup A_2}$

whence

$\displaystyle 1_{A_1}*1_B + 1_{A_2}*1_B = 1_{A_1 \cap A_2}*1_B + 1_{A_1 \cup A_2}*1_B$

and thus (noting that the quantities on the left are closer to each other than the quantities on the right)

$\displaystyle \min(1_{A_1}*1_B,t) + \min(1_{A_2}*1_B,t)$

$\displaystyle \geq \min(1_{A_1 \cap A_2}*1_B,t) + \min(1_{A_1 \cup A_2}*1_B,t)$

at which point (2) follows by integrating over ${G}$ and then using the inclusion-exclusion principle.

Now introduce the function

$\displaystyle f(a) := \inf \{ c(A) : \mu(A) = a \}$

for ${t \leq a \leq 1-\mu(B)+t}$. From the preceding discussion ${f(a)}$ vanishes at the endpoints ${a = t, 1-\mu(B)+t}$; our task is to show that ${f(a)}$ is non-negative in the interior region ${t < a < 1-\mu(B)+t}$. Suppose for contradiction that this was not the case. It is easy to see that ${f}$ is continuous (indeed, it is even Lipschitz continuous), so there must be ${t < a < 1-\mu(B)+t}$ at which ${f}$ is a local minimum and not locally constant. In particular, ${0 . But for any ${A}$ with ${\mu(A) = a}$, we have the translation-invariance

$\displaystyle c(gA) = c(A) \ \ \ \ \ (3)$

for any ${g \in G}$, and hence by (2)

$\displaystyle c(A) \geq \frac{1}{2} c(A \cap gA) + \frac{1}{2} c(A \cup gA ).$

Note that ${\mu(A \cap gA)}$ depends continuously on ${g}$, equals ${a}$ when ${g}$ is the identity, and has an average value of ${a^2}$. As ${G}$ is connected, we thus see from the intermediate value theorem that for any ${0 < \epsilon < a-a^2}$, we can find ${g}$ such that ${\mu(A \cap gA) = a-\epsilon}$, and thus by inclusion-exclusion ${\mu(A \cup gA) = a+\epsilon}$. By definition of ${f}$, we thus have

$\displaystyle c(A) \geq \frac{1}{2} f(a-\epsilon) + \frac{1}{2} f(a+\epsilon).$

Taking infima in ${A}$ (and noting that the hypotheses on ${\epsilon}$ are independent of ${A}$) we conclude that

$\displaystyle f(a) \geq \frac{1}{2} f(a-\epsilon) + \frac{1}{2} f(a+\epsilon)$

for all ${0 < \epsilon < a-a^2}$. As ${f}$ is a local minimum and ${\epsilon}$ is arbitrarily small, this implies that ${f}$ is locally constant, a contradiction. This establishes Theorem 2.

We observe the following corollary:

Corollary 3 Let ${G}$ be a compact connected group, with a Haar probability measure ${\mu}$. Let ${A, B, C}$ be compact subsets of ${G}$, and let ${\delta := \min(\mu(A),\mu(B),\mu(C))}$. Then one has the pointwise estimate

$\displaystyle 1_A * 1_B * 1_C \geq \frac{1}{4} (\mu(A)+\mu(B)+\mu(C)-1)_+^2$

if ${\mu(A)+\mu(B)+\mu(C)-1 \leq 2 \delta}$, and

$\displaystyle 1_A * 1_B * 1_C \geq \delta (\mu(A)+\mu(B)+\mu(C)-1-\delta)$

if ${\mu(A)+\mu(B)+\mu(C)-1 \geq 2 \delta}$.

Once again, the bounds are completely sharp, as can be seen by computing ${1_A*1_B*1_C}$ when ${A,B,C}$ are arcs of a circle. For quasirandom ${G}$, one can do much better than these bounds, as discussed in this recent blog post; thus, the abelian case is morally the worst case here, although it seems difficult to convert this intuition into a rigorous reduction.

Proof: By cyclic permutation we may take ${\delta = \mu(C)}$. For any

$\displaystyle (\mu(A)+\mu(B)-1)_+ \leq t \leq \min(\mu(A),\mu(B)),$

we can bound

$\displaystyle 1_A*1_B*1_C \geq \min(1_A*1_B,t)*1_C$

$\displaystyle \geq \int_G \min(1_A*1_B,t)\ d\mu - t (1-\mu(C))$

$\displaystyle \geq t (\mu(A)+\mu(B)-t) - t (1-\mu(C))$

$\displaystyle = t \min( \mu(A)+\mu(B)+\mu(C)-1-t )$

where we used Theorem 2 to obtain the third line. Optimising in ${t}$, we obtain the claim. $\Box$

A few days ago, I received the sad news that Yahya Ould Hamidoune had recently died. Hamidoune worked in additive combinatorics, and had recently solved a question on noncommutative Freiman-Kneser theorems posed by myself on this blog last year. Namely, Hamidoune showed

Theorem 1 (Noncommutative Freiman-Kneser theorem for small doubling) Let ${0 < \epsilon \leq 1}$, and let ${S \subset G}$ be a finite non-empty subset of a multiplicative group ${G}$ such that ${|A \cdot S| \leq (2-\epsilon) |S|}$ for some finite set ${A}$ of cardinality ${|A|}$ at least ${|S|}$, where ${A \cdot S := \{ as: a \in A, s \in S \}}$ is the product set of ${A}$ and ${S}$. Then there exists a finite subgroup ${H}$ of ${G}$ with cardinality ${|H| \leq C(\epsilon) |S|}$, such that ${S}$ is covered by at most ${C'(\epsilon)}$ right-cosets ${H \cdot x}$ of ${H}$, where ${C(\epsilon), C'(\epsilon) > 0}$ depend only on ${\epsilon}$.

One can of course specialise here to the case ${A=S}$, and view this theorem as a classification of those sets ${S}$ of doubling constant at most ${2-\epsilon}$.

In fact Hamidoune’s argument, which is completely elementary, gives the very nice explicit constants ${C(\epsilon) := \frac{2}{\epsilon}}$ and ${C'(\epsilon) := \frac{2}{\epsilon} - 1}$, which are essentially optimal except for factors of ${2}$ (as can be seen by considering an arithmetic progression in an additive group). This result was also independently established (in the ${A=S}$ case) by Tom Sanders (unpublished) by a more Fourier-analytic method, in particular drawing on Sanders’ deep results on the Wiener algebra ${A(G)}$ on arbitrary non-commutative groups ${G}$.

This type of result had previously been known when ${2-\epsilon}$ was less than the golden ratio ${\frac{1+\sqrt{5}}{2}}$, as first observed by Freiman; see my previous blog post for more discussion.

Theorem 1 is not, strictly speaking, contained in Hamidoune’s paper, but can be extracted from his arguments, which share some similarity with the recent simple proof of the Ruzsa-Plünnecke inequality by Petridis (as discussed by Tim Gowers here), and this is what I would like to do below the fold. I also include (with permission) Sanders’ unpublished argument, which proceeds instead by Fourier-analytic methods. Read the rest of this entry »

In Notes 3, we saw that the number of additive patterns in a given set was (in principle, at least) controlled by the Gowers uniformity norms of functions associated to that set.

Such norms can be defined on any finite additive group (and also on some other types of domains, though we will not discuss this point here). In particular, they can be defined on the finite-dimensional vector spaces ${V}$ over a finite field ${{\bf F}}$.

In this case, the Gowers norms ${U^{d+1}(V)}$ are closely tied to the space ${\hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z})}$ of polynomials of degree at most ${d}$. Indeed, as noted in Exercise 20 of Notes 4, a function ${f: V \rightarrow {\bf C}}$ of ${L^\infty(V)}$ norm ${1}$ has ${U^{d+1}(V)}$ norm equal to ${1}$ if and only if ${f = e(\phi)}$ for some ${\phi \in \hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z})}$; thus polynomials solve the “${100\%}$ inverse problem” for the trivial inequality ${\|f\|_{U^{d+1}(V)} \leq \|f\|_{L^\infty(V)}}$. They are also a crucial component of the solution to the “${99\%}$ inverse problem” and “${1\%}$ inverse problem”. For the former, we will soon show:

Proposition 1 (${99\%}$ inverse theorem for ${U^{d+1}(V)}$) Let ${f: V \rightarrow {\bf C}}$ be such that ${\|f\|_{L^\infty(V)}}$ and ${\|f\|_{U^{d+1}(V)} \geq 1-\epsilon}$ for some ${\epsilon > 0}$. Then there exists ${\phi \in \hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z})}$ such that ${\| f - e(\phi)\|_{L^1(V)} = O_{d, {\bf F}}( \epsilon^c )}$, where ${c = c_d > 0}$ is a constant depending only on ${d}$.

Thus, for the Gowers norm to be almost completely saturated, one must be very close to a polynomial. The converse assertion is easily established:

Exercise 1 (Converse to ${99\%}$ inverse theorem for ${U^{d+1}(V)}$) If ${\|f\|_{L^\infty(V)} \leq 1}$ and ${\|f-e(\phi)\|_{L^1(V)} \leq \epsilon}$ for some ${\phi \in \hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z})}$, then ${\|F\|_{U^{d+1}(V)} \geq 1 - O_{d,{\bf F}}( \epsilon^c )}$, where ${c = c_d > 0}$ is a constant depending only on ${d}$.

In the ${1\%}$ world, one no longer expects to be close to a polynomial. Instead, one expects to correlate with a polynomial. Indeed, one has

Lemma 2 (Converse to the ${1\%}$ inverse theorem for ${U^{d+1}(V)}$) If ${f: V \rightarrow {\bf C}}$ and ${\phi \in \hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z})}$ are such that ${|\langle f, e(\phi) \rangle_{L^2(V)}| \geq \epsilon}$, where ${\langle f, g \rangle_{L^2(V)} := {\bf E}_{x \in G} f(x) \overline{g(x)}}$, then ${\|f\|_{U^{d+1}(V)} \geq \epsilon}$.

Proof: From the definition of the ${U^1}$ norm (equation (18) from Notes 3), the monotonicity of the Gowers norms (Exercise 19 of Notes 3), and the polynomial phase modulation invariance of the Gowers norms (Exercise 21 of Notes 3), one has

$\displaystyle |\langle f, e(\phi) \rangle| = \| f e(-\phi) \|_{U^1(V)}$

$\displaystyle \leq \|f e(-\phi) \|_{U^{d+1}(V)}$

$\displaystyle = \|f\|_{U^{d+1}(V)}$

and the claim follows. $\Box$

In the high characteristic case ${\hbox{char}({\bf F}) > d}$ at least, this can be reversed:

Theorem 3 (${1\%}$ inverse theorem for ${U^{d+1}(V)}$) Suppose that ${\hbox{char}({\bf F}) > d \geq 0}$. If ${f: V \rightarrow {\bf C}}$ is such that ${\|f\|_{L^\infty(V)} \leq 1}$ and ${\|f\|_{U^{d+1}(V)} \geq \epsilon}$, then there exists ${\phi \in \hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z})}$ such that ${|\langle f, e(\phi) \rangle_{L^2(V)}| \gg_{\epsilon,d,{\bf F}} 1}$.

This result is sometimes referred to as the inverse conjecture for the Gowers norm (in high, but bounded, characteristic). For small ${d}$, the claim is easy:

Exercise 2 Verify the cases ${d=0,1}$ of this theorem. (Hint: to verify the ${d=1}$ case, use the Fourier-analytic identities ${\|f\|_{U^2(V)} = (\sum_{\xi \in \hat V} |\hat f(\xi)|^4)^{1/4}}$ and ${\|f\|_{L^2(V)} = (\sum_{\xi \in \hat V} |\hat f(\xi)|^2)^{1/2}}$, where ${\hat V}$ is the space of all homomorphisms ${\xi: x \mapsto \xi \cdot x}$ from ${V}$ to ${{\bf R}/{\bf Z}}$, and ${\hat f(\xi) := \mathop{\bf E}_{x \in V} f(x) e(-\xi \cdot x)}$ are the Fourier coefficients of ${f}$.)

This conjecture for larger values of ${d}$ are more difficult to establish. The ${d=2}$ case of the theorem was established by Ben Green and myself in the high characteristic case ${\hbox{char}({\bf F}) > 2}$; the low characteristic case ${\hbox{char}({\bf F}) = d = 2}$ was independently and simultaneously established by Samorodnitsky. The cases ${d>2}$ in the high characteristic case was established in two stages, firstly using a modification of the Furstenberg correspondence principle, due to Ziegler and myself. to convert the problem to an ergodic theory counterpart, and then using a modification of the methods of Host-Kra and Ziegler to solve that counterpart, as done in this paper of Bergelson, Ziegler, and myself.

The situation with the low characteristic case in general is still unclear. In the high characteristic case, we saw from Notes 4 that one could replace the space of non-classical polynomials ${\hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z})}$ in the above conjecture with the essentially equivalent space of classical polynomials ${\hbox{Poly}_{\leq d}(V \rightarrow {\bf F})}$. However, as we shall see below, this turns out not to be the case in certain low characteristic cases (a fact first observed by Lovett, Meshulam, and Samorodnitsky, and independently by Ben Green and myself), for instance if ${\hbox{char}({\bf F}) = 2}$ and ${d \geq 3}$; this is ultimately due to the existence in those cases of non-classical polynomials which exhibit no significant correlation with classical polynomials of equal or lesser degree. This distinction between classical and non-classical polynomials appears to be a rather non-trivial obstruction to understanding the low characteristic setting; it may be necessary to obtain a more complete theory of non-classical polynomials in order to fully settle this issue.

The inverse conjecture has a number of consequences. For instance, it can be used to establish the analogue of Szemerédi’s theorem in this setting:

Theorem 4 (Szemerédi’s theorem for finite fields) Let ${{\bf F} = {\bf F}_p}$ be a finite field, let ${\delta > 0}$, and let ${A \subset {\bf F}^n}$ be such that ${|A| \geq \delta |{\bf F}^n|}$. If ${n}$ is sufficiently large depending on ${p,\delta}$, then ${A}$ contains an (affine) line ${\{ x, x+r, \ldots, x+(p-1)r\}}$ for some ${x,r \in {\bf F}^n}$ with ${ r\neq 0}$.

Exercise 3 Use Theorem 4 to establish the following generalisation: with the notation as above, if ${k \geq 1}$ and ${n}$ is sufficiently large depending on ${p,\delta}$, then ${A}$ contains an affine ${k}$-dimensional subspace.

We will prove this theorem in two different ways, one using a density increment method, and the other using an energy increment method. We discuss some other applications below the fold.

A handy inequality in additive combinatorics is the Plünnecke-Ruzsa inequality:

Theorem 1 (Plünnecke-Ruzsa inequality) Let ${A, B_1, \ldots, B_m}$ be finite non-empty subsets of an additive group ${G}$, such that ${|A+B_i| \leq K_i |A|}$ for all ${1 \leq i \leq m}$ and some scalars ${K_1,\ldots,K_m \geq 1}$. Then there exists a subset ${A'}$ of ${A}$ such that ${|A' + B_1 + \ldots + B_m| \leq K_1 \ldots K_m |A'|}$.

The proof uses graph-theoretic techniques. Setting ${A=B_1=\ldots=B_m}$, we obtain a useful corollary: if ${A}$ has small doubling in the sense that ${|A+A| \leq K|A|}$, then we have ${|mA| \leq K^m |A|}$ for all ${m \geq 1}$, where ${mA = A + \ldots + A}$ is the sum of ${m}$ copies of ${A}$.

In a recent paper, I adapted a number of sum set estimates to the entropy setting, in which finite sets such as ${A}$ in ${G}$ are replaced with discrete random variables ${X}$ taking values in ${G}$, and (the logarithm of) cardinality ${|A|}$ of a set ${A}$ is replaced by Shannon entropy ${{\Bbb H}(X)}$ of a random variable ${X}$. (Throughout this note I assume all entropies to be finite.) However, at the time, I was unable to find an entropy analogue of the Plünnecke-Ruzsa inequality, because I did not know how to adapt the graph theory argument to the entropy setting.

I recently discovered, however, that buried in a classic paper of Kaimonovich and Vershik (implicitly in Proposition 1.3, to be precise) there was the following analogue of Theorem 1:

Theorem 2 (Entropy Plünnecke-Ruzsa inequality) Let ${X, Y_1, \ldots, Y_m}$ be independent random variables of finite entropy taking values in an additive group ${G}$, such that ${{\Bbb H}(X+Y_i) \leq {\Bbb H}(X) + \log K_i}$ for all ${1 \leq i \leq m}$ and some scalars ${K_1,\ldots,K_m \geq 1}$. Then ${{\Bbb H}(X+Y_1+\ldots+Y_m) \leq {\Bbb H}(X) + \log K_1 \ldots K_m}$.

In fact Theorem 2 is a bit “better” than Theorem 1 in the sense that Theorem 1 needed to refine the original set ${A}$ to a subset ${A'}$, but no such refinement is needed in Theorem 2. One corollary of Theorem 2 is that if ${{\Bbb H}(X_1+X_2) \leq {\Bbb H}(X) + \log K}$, then ${{\Bbb H}(X_1+\ldots+X_m) \leq {\Bbb H}(X) + (m-1) \log K}$ for all ${m \geq 1}$, where ${X_1,\ldots,X_m}$ are independent copies of ${X}$; this improves slightly over the analogous combinatorial inequality. Indeed, the function ${m \mapsto {\Bbb H}(X_1+\ldots+X_m)}$ is concave (this can be seen by using the ${m=2}$ version of Theorem 2 (or (2) below) to show that the quantity ${{\Bbb H}(X_1+\ldots+X_{m+1})-{\Bbb H}(X_1+\ldots+X_m)}$ is decreasing in ${m}$).

Theorem 2 is actually a quick consequence of the submodularity inequality

$\displaystyle {\Bbb H}(W) + {\Bbb H}(X) \leq {\Bbb H}(Y) + {\Bbb H}(Z) \ \ \ \ \ (1)$

in information theory, which is valid whenever ${X,Y,Z,W}$ are discrete random variables such that ${Y}$ and ${Z}$ each determine ${X}$ (i.e. ${X}$ is a function of ${Y}$, and also a function of ${Z}$), and ${Y}$ and ${Z}$ jointly determine ${W}$ (i.e ${W}$ is a function of ${Y}$ and ${Z}$). To apply this, let ${X, Y, Z}$ be independent discrete random variables taking values in ${G}$. Observe that the pairs ${(X,Y+Z)}$ and ${(X+Y,Z)}$ each determine ${X+Y+Z}$, and jointly determine ${(X,Y,Z)}$. Applying (1) we conclude that

$\displaystyle {\Bbb H}(X,Y,Z) + {\Bbb H}(X+Y+Z) \leq {\Bbb H}(X,Y+Z) + {\Bbb H}(X+Y,Z)$

which after using the independence of ${X,Y,Z}$ simplifies to the sumset submodularity inequality

$\displaystyle {\Bbb H}(X+Y+Z) + {\Bbb H}(Y) \leq {\Bbb H}(X+Y) + {\Bbb H}(Y+Z) \ \ \ \ \ (2)$

(this inequality was also recently observed by Madiman; it is the ${m=2}$ case of Theorem 2). As a corollary of this inequality, we see that if ${{\Bbb H}(X+Y_i) \leq {\Bbb H}(X) + \log K_i}$, then

$\displaystyle {\Bbb H}(X+Y_1+\ldots+Y_i) \leq {\Bbb H}(X+Y_1+\ldots+Y_{i-1}) + \log K_i,$

and Theorem 2 follows by telescoping series.

The proof of Theorem 2 seems to be genuinely different from the graph-theoretic proof of Theorem 1. It would be interesting to see if the above argument can be somehow adapted to give a stronger version of Theorem 1. Note also that both Theorem 1 and Theorem 2 have extensions to more general combinations of ${X,Y_1,\ldots,Y_m}$ than ${X+Y_i}$; see this paper and this paper respectively.

Below the fold is a version of my talk “Recent progress on the Kakeya conjecture” that I gave at the Fefferman conference.

Title: Use basic examples to calibrate exponents

Motivation: In the more quantitative areas of mathematics, such as analysis and combinatorics, one has to frequently keep track of a large number of exponents in one’s identities, inequalities, and estimates.  For instance, if one is studying a set of N elements, then many expressions that one is faced with will often involve some power $N^p$ of N; if one is instead studying a function f on a measure space X, then perhaps it is an $L^p$ norm $\|f\|_{L^p(X)}$ which will appear instead.  The exponent $p$ involved will typically evolve slowly over the course of the argument, as various algebraic or analytic manipulations are applied.  In some cases, the exact value of this exponent is immaterial, but at other times it is crucial to have the correct value of $p$ at hand.   One can (and should) of course carefully go through one’s arguments line by line to work out the exponents correctly, but it is all too easy to make a sign error or other mis-step at one of the lines, causing all the exponents on subsequent lines to be incorrect.  However, one can guard against this (and avoid some tedious line-by-line exponent checking) by continually calibrating these exponents at key junctures of the arguments by using basic examples of the object of study (sets, functions, graphs, etc.) as test cases.  This is a simple trick, but it lets one avoid many unforced errors with exponents, and also lets one compute more rapidly.

Quick description: When trying to quickly work out what an exponent p in an estimate, identity, or inequality should be without deriving that statement line-by-line, test that statement with a simple example which has non-trivial behaviour with respect to that exponent p, but trivial behaviour with respect to as many other components of that statement as one is able to manage.   The “non-trivial” behaviour should be parametrised by some very large or very small parameter.  By matching the dependence on this parameter on both sides of the estimate, identity, or inequality, one should recover p (or at least a good prediction as to what p should be).

General discussion: The test examples should be as basic as possible; ideally they should have trivial behaviour in all aspects except for one feature that relates to the exponent p that one is trying to calibrate, thus being only “barely” non-trivial.   When the object of study is a function, then (appropriately rescaled, or otherwise modified) bump functions are very typical test objects, as are Dirac masses, constant functions, Gaussians, or other functions that are simple and easy to compute with.  In additive combinatorics, when the object of study is a subset of a group, then subgroups, arithmetic progressions, or random sets are typical test objects.  In graph theory, typical examples of test objects include complete graphs, complete bipartite graphs, and random graphs. And so forth.

This trick is closely related to that of using dimensional analysis to recover exponents; indeed, one can view dimensional analysis as the special case of exponent calibration when using test objects which are non-trivial in one dimensional aspect (e.g. they exist at a single very large or very small length scale) but are otherwise of a trivial or “featureless” nature.   But the calibration trick is more general, as it can involve parameters (such as probabilities, angles, or eccentricities) which are not commonly associated with the physical concept of a dimension.  And personally, I find example-based calibration to be a much more satisfying (and convincing) explanation of an exponent than a calibration arising from formal dimensional analysis.

When one is trying to calibrate an inequality or estimate, one should try to pick a basic example which one expects to saturate that inequality or estimate, i.e. an example for which the inequality is close to being an equality.  Otherwise, one would only expect to obtain some partial information on the desired exponent p (e.g. a lower bound or an upper bound only).  Knowing the examples that saturate an estimate that one is trying to prove is also useful for several other reasons – for instance, it strongly suggests that any technique which is not efficient when applied to the saturating example, is unlikely to be strong enough to prove the estimate in general, thus eliminating fruitless approaches to a problem and (hopefully) refocusing one’s attention on those strategies which actually have a chance of working.

Calibration is best used for the type of quick-and-dirty calculations one uses when trying to rapidly map out an argument that one has roughly worked out already, but without precise details; in particular, I find it particularly useful when writing up a rapid prototype.  When the time comes to write out the paper in full detail, then of course one should instead carefully work things out line by line, but if all goes well, the exponents obtained in that process should match up with the preliminary guesses for those exponents obtained by calibration, which adds confidence that there are no exponent errors have been committed.

Additive combinatorics is largely focused on the additive properties of finite subsets A of an additive group $G = (G,+)$.  This group can be finite or infinite, but there is a very convenient trick, the Ruzsa projection trick, which allows one to reduce the latter case to the former.  For instance, consider the set $A = \{1,\ldots,N\}$ inside the integers ${\Bbb Z}$.  The integers of course form an infinite group, but if we are only interested in sums of at most two elements of A at a time, we can embed A ininside the finite cyclic group ${\Bbb Z}/2N{\Bbb Z}$ without losing any combinatorial information.  More precisely, there is a Freiman isomorphism of order 2 between the set $\{1,\ldots,N\}$ in ${\Bbb Z}$ and the set $\{1,\ldots,N\}$ in ${\Bbb Z}/2N{\Bbb Z}$.  One can view the latter version of $\{1,\ldots,N\}$ as a model for the former version of $\{1,\ldots,N\}$. More generally, it turns out that any finite set A in an additive group can be modeled in the above set by an equivalent set in a finite group, and in fact one can ensure that this ambient modeling group is not much larger than A itself if A has some additive structure; see this paper of Ruzsa (or Lemma 5.26 of my book with Van Vu) for a precise statement.  This projection trick has a number of important uses in additive combinatorics, most notably in Ruzsa’s simplified proof of Freiman’s theorem.

Given the interest in non-commutative analogues of Freiman’s theorem, it is natural to ask whether one can similarly model finite sets A in multiplicative (and non-commutative) groups $G = (G,\times)$ using finite models.  Unfortunately (as I learned recently from Akshay Venkatesh, via Ben Green), this turns out to be impossible in general, due to an old example of Higman.  More precisely, Higman shows:

Theorem 1. There exists an infinite group G generated by four distinct elements a,b,c,d that obey the relations

$ab = ba^2; bc = cb^2; cd = dc^2; da = ad^2$; (1)

in fact, a and c generate the free group in G.  On the other hand, if G’ is a finite group containing four elements a,b,c,d obeying (1), then a,b,c,d are all trivial.

As a consequence, the finite set $A := \{ 1, a, b, c, d, ab, bc, cd, da \}$ in G has no model (in the sense of Freiman isomorphisms) in a finite group.

Theorem 1 is proven by a small amount of elementary group theory and number theory, and it was neat enough that I thought I would reproduce it here.

I’ve uploaded a new paper to the arXiv entitled “The sum-product phenomenon in arbitrary rings“, and submitted to Contributions to Discrete Mathematics. The sum-product phenomenon asserts, very roughly speaking, that given a finite non-empty set A in a ring R, then either the sum set $A+A := \{ a+b: a, b \in A \}$ or the product set $A \cdot A := \{ ab: a, b \in A \}$ will be significantly larger than A, unless A is somehow very close to being a subring of R, or if A is highly degenerate (for instance, containing a lot of zero divisors). For instance, in the case of the integers $R = {\Bbb Z}$, which has no non-trivial finite subrings, a long-standing conjecture of Erdös and Szemerédi asserts that $|A+A| + |A \cdot A| \gg_\varepsilon |A|^{2-\varepsilon}$ for every finite non-empty $A \subset {\Bbb Z}$ and every $\varepsilon > 0$. (The current best result on this problem is a very recent result of Solymosi, who shows that the conjecture holds for any $\varepsilon$ greater than 2/3.) In recent years, many other special rings have been studied intensively, most notably finite fields and cyclic groups, but also the complex numbers, quaternions, and other division algebras, and continuous counterparts in which A is now (for instance) a collection of intervals on the real line. I will not try to summarise all the work on sum-product estimates and their applications (which range from number theory to graph theory to ergodic theory to computer science) here, but I discuss this in the introduction to my paper, which has over 50 references to this literature (and I am probably still missing out on a few).

I was recently asked the question as to what could be said about the sum-product phenomenon in an arbitrary ring R, which need not be commutative or contain a multiplicative identity. Once one makes some assumptions to avoid the degenerate case when A (or related sets, such as A-A) are full of zero-divisors, it turns out that there is in fact quite a bit one can say, using only elementary methods from additive combinatorics (in particular, the Plünnecke-Ruzsa sum set theory). Roughly speaking, the main results of the paper assert that in an arbitrary ring, a set A which is non-degenerate and has small sum set and product set must be mostly contained inside a subring of R of comparable size to A, or a dilate of such a subring, though in the absence of invertible elements one sometimes has to enlarge the ambient ring R slightly before one can find the subring. At the end of the paper I specialise these results to specific rings, such as division algebras or products of division algebras, cyclic groups, or finite-dimensional algebras over fields. Generally speaking, the methods here give very good results when the set of zero divisors is sparse and easily describable, but poor results otherwise. (In particular, the sum-product theory in cyclic groups, as worked out by Bourgain and coauthors, is only recovered for groups which are the product of a bounded number of primes; the case of cyclic groups whose order has many factors seems to require a more multi-scale analysis which I did not attempt to perform in this paper.)
Read the rest of this entry »

This is my second Milliman lecture, in which I talk about recent applications of ideas from additive combinatorics (and in particular, from the inverse Littlewood-Offord problem) to the theory of discrete random matrices.
Read the rest of this entry »

This week I am visiting the University of Washington in Seattle, giving the Milliman Lecture Series for 2007-2008. My chosen theme here is “Recent developments in arithmetic combinatorics“. In my first lecture, I will speak (once again) on how methods in additive combinatorics have allowed us to detect additive patterns in the prime numbers, in particular discussing my joint work with Ben Green. In the second lecture I will discuss how additive combinatorics has made it possible to study the invertibility and spectral behaviour of random discrete matrices, in particular discussing my joint work with Van Vu; and in the third lecture I will discuss how sum-product estimates have recently led to progress in the theory of expanders relating to Lie groups, as well as to sieving over orbits of such groups, in particular presenting work of Jean Bourgain and his coauthors.