It occurred to me recently that the mathematical blog medium may be a good venue not just for expository “short stories” on mathematical concepts or results, but also for more technical discussions of individual mathematical “tricks”, which would otherwise not be significant enough to warrant a publication-length (and publication-quality) article. So I thought today that I would discuss the amplification trick in harmonic analysis and combinatorics (and in particular, in the study of estimates); this trick takes an established estimate involving an arbitrary object (such as a function f), and obtains a stronger (or amplified) estimate by transforming the object in a well-chosen manner (often involving some new parameters) into a new object, applying the estimate to that new object, and seeing what that estimate says about the original object (after optimising the parameters or taking a limit). The amplification trick works particularly well for estimates which enjoy some sort of symmetry on one side of the estimate that is not represented on the other side; indeed, it can be viewed as a way to “arbitrage” differing amounts of symmetry between the left- and right-hand sides of an estimate. It can also be used in the contrapositive, amplifying a weak counterexample to an estimate into a strong counterexample. This trick also sheds some light as to why dimensional analysis works; an estimate which is not dimensionally consistent can often be amplified into a stronger estimate which is dimensionally consistent; in many cases, this new estimate is so strong that it cannot in fact be true, and thus dimensionally inconsistent inequalities tend to be either false or inefficient, which is why we rarely see them. (More generally, any inequality on which a group acts on either the left or right-hand side can often be “decomposed” into the “isotypic components” of the group action, either by the amplification trick or by other related tools, such as Fourier analysis.)

The amplification trick is a deceptively simple one, but it can become particularly powerful when one is arbitraging an unintuitive symmetry, such as symmetry under tensor powers. Indeed, the “tensor power trick”, which can eliminate constants and even logarithms in an almost magical manner, can lead to some interesting proofs of sharp inequalities, which are difficult to establish by more direct means.

The most familiar example of the amplification trick in action is probably the textbook proof of the Cauchy-Schwarz inequality

|\langle v, w \rangle| \leq \|v\| \|w\| (1)

for vectors v, w in a complex Hilbert space. To prove this inequality, one might start by exploiting the obvious inequality

\|v-w\|^2 \geq 0 (2)

but after expanding everything out, one only gets the weaker inequality

\hbox{Re} \langle v, w \rangle \leq \frac{1}{2} \|v\|^2 + \frac{1}{2} \|w\|^2. (3)

Now (3) is weaker than (1) for two reasons; the left-hand side is smaller, and the right-hand side is larger (thanks to the arithmetic mean-geometric mean inequality). However, we can amplify (3) by arbitraging some symmetry imbalances. Firstly, observe that the phase rotation symmetry v \mapsto e^{i\theta} v preserves the RHS of (3) but not the LHS. We exploit this by replacing v by e^{i\theta} v in (3) for some phase \theta to be chosen later, to obtain

\hbox{Re} e^{i\theta} \langle v, w \rangle \leq \frac{1}{2} \|v\|^2 + \frac{1}{2} \|w\|^2.

Now we are free to choose \theta at will (as long as it is real, of course), so it is natural to choose \theta to optimise the inequality, which in this case means to make the left-hand side as large as possible. This is achieved by choosing e^{i\theta} to cancel the phase of \langle v, w \rangle, and we obtain

|\langle v, w \rangle| \leq \frac{1}{2} \|v\|^2 + \frac{1}{2} \|w\|^2 (4)

This is closer to (1); we have fixed the left-hand side, but the right-hand side is still too weak. But we can amplify further, by exploiting an imbalance in a different symmetry, namely the homogenisation symmetry (v,w) \mapsto (\lambda v, \frac{1}{\lambda} w) for a scalar \lambda > 0, which preserves the left-hand side but not the right. Inserting this transform into (4) we conclude that

|\langle v, w \rangle| \leq \frac{\lambda^2}{2} \|v\|^2 + \frac{1}{2\lambda^2} \|w\|^2

where \lambda > 0 is at our disposal to choose. We can optimise in \lambda by minimising the right-hand side, and indeed one easily sees that the minimum (or infimum, if one of v and w vanishes) is \|v\| \|w\| (which is achieved when \lambda = \sqrt{\|w\|/\|v\|} when v,w are non-zero, or in an asymptotic limit \lambda \to 0 or \lambda \to \infty in the degenerate cases), and so we have amplified our way to the Cauchy-Schwarz inequality (1). [See also this discussion by Tim Gowers on the Cauchy-Schwarz inequality.]

— Amplification via phase, homogeneity, or dilation symmetry —

Many similar examples of amplification are used routinely to prove the basic inequalities in harmonic analysis. For instance to deduce the complex-valued triangle inequality

|\int_X f(x)\ d\mu(x)| \leq \int_X |f(x)|\ d\mu(x) (5)

(where (X,\mu) is a measure space and f is absolutely integrable) from its real-valued counterpart, we first apply the latter inequality to \hbox{Re}(f) to obtain

|\hbox{Re} \int_X f(x)\ d\mu(x)| \leq \int_X |\hbox{Re} f(x)|\ d\mu(x).

To make the right-hand side phase-rotation-invariant, we crudely bound |\hbox{Re} f(x)| by |f(x)|, obtaining

|\hbox{Re} \int_X f(x)\ d\mu(x)| \leq \int_X |f(x)|\ d\mu(x)

and then one can arbitrage the imbalance in phase rotation symmetry to obtain (5). For another well-known example, to prove Hölder’s inequality

\int_X f(x) g(x)\ d\mu(x) \leq \|f\|_{L^p(X,d\mu)} \|g\|_{L^q(X,d\mu)} (6)

for non-negative measurable f, g and dual exponents 1 \leq p,q \leq \infty, one can begin with the elementary (weighted) arithmetic mean-geometric mean inequality

ab \leq \frac{1}{p} a^p + \frac{1}{q} b^q (7)

for non-negative a,b (which follows from the convexity of the function \theta \mapsto a^\theta b^{1-\theta}, which in turn follows from the convexity of the exponential function) to obtain the inequality

\int_X f(x) g(x)\ d\mu(x) \leq \frac{1}{p} \|f\|_{L^p(X,d\mu)}^p + \frac{1}{q} \|g\|_{L^q(X,d\mu)}^q.

This inequality is weaker than (6) (because of (7)); but if one amplifies by arbitraging the imbalance in the homogenisation symmetry (f,g) \mapsto (\lambda f, \frac{1}{\lambda} g) one obtains (6). As a third example, the Sobolev embedding inequality

\|f\|_{L^q({\Bbb R}^d)} \leq C_{p,q,d} (\|f\|_{L^p({\Bbb R}^d)} + \| \nabla f \|_{L^p({\Bbb R}^d)}), (8)

which is valid for 1 < p < q \frac{1}{p} - \frac{1}{d} (and also valid in some endpoint cases) and all test functions (say) f on {\Bbb R}^d, can be amplified to obtain the Gagliardo-Nirenberg inequality

\|f\|_{L^q({\Bbb R}^d)} \leq C_{p,q,d} \|f\|_{L^p({\Bbb R}^d)}^{1-\theta} \| \nabla f \|_{L^p({\Bbb R}^d)}^\theta (9)

where 0 < \theta < 1 is the number such that \frac{1}{q} = \frac{1}{p} - \frac{\theta}{d}, by arbitraging the action of the dilation group f(x) \mapsto f(\lambda x). (In this case, the dilation action does not leave either the LHS or RHS of (8) invariant, but it affects the LHS in a well controlled manner, which can be normalised out by dividing by a suitable power of \lambda.) The same trick, incidentally, reveals why the Sobolev embedding inequality fails when q < p or when \frac{1}{q} < \frac{1}{p} - \frac{1}{d}, because in these cases it leads to an absurd version of the Gagliardo-Nirenberg inequality. Observe also that the Gagliardo-Nirenberg inequality (9) is dimensionally consistent; the dilation action affects both sides of the inequality in the same way. (The weight of the representation of the dilation action on an expression is the same thing as the exponent of the length unit that one assigns to the dimension of that expression.) More generally, arbitraging a dilation symmetry allows a dimensionally consistent inequality to emerge from a dimensionally inconsistent (or dimensionally inefficient) one.

— Amplification using linearity —

Another powerful source of amplification is linearity (the principle of superposition). A simple example of this is depolarisation. Suppose one has a symmetric bilinear form B(f,g): X \times X \to {\Bbb R} from a normed vector space X to the real numbers, and one has already proven the polarised inequality

|B(f,f)| \leq A \|f\|_X^2

for all f in X. One can amplify this by replacing f with f+cg for arbitrary f, g in X and a real parameter c, obtaining

|B(f,f)+2c B(f,g) + c^2 B(g,g)| \leq A (\|f\|_X+|c| \|g\|_X)^2.

Optimising this in c (e.g. taking c := \|f\|_X/\|g\|_X) and using the triangle inequality, one eventually obtains the amplified (depolarised) inequality

|B(f,g)| \leq C A \|f\|_X \|g\|_X

for some absolute constant C.

For a slightly more sophisticated example, suppose for instance that one has a linear operator T: L^p(X) \to L^p(Y) for some 0 < p < \infty and some measure spaces X,Y, and that one has established a scalar estimate of the form

\| Tf \|_{L^p(Y)} \leq A \|f\|_{L^p(X)} (10)

for arbitrary scalar functions f. Then by replacing f by a signed sum \sum_{n=1}^N \epsilon_n f_n, where f_1,\ldots ,f_N are arbitrary functions in L^p(X) and \epsilon_n \in \{-1,+1\} are signs, and using linearity, we obtain

\| \sum_{n=1}^N \epsilon_n Tf_n \|_{L^p(Y)} \leq A \| \sum_{n=1}^N \epsilon_n f_n \|_{L^p(X)}.

If we raise this to the p^{th} power, take the \epsilon_n to be random (Bernoulli) signs (in order to avoid unexpectedly large cancellations in the series), and then take expectations of both sides, we obtain

{\Bbb E} \| \sum_{n=1}^N \epsilon_n Tf_n \|_{L^p(Y)}^p \leq A^p {\Bbb E}\| \sum_{n=1}^N \epsilon_n f_n \|_{L^p(X)}^p.

If one then uses Khintchine’s inequality to compute the expectations, one ends up with the vector valued estimate

\| (\sum_{n=1}^N |Tf_n|^2)^{1/2} \|_{L^p(Y)} \leq C_p A \| (\sum_{n=1}^N |f_n|^2)^{1/2} \|_{L^p(X)}

for some constant C_p depending only on p (in particular, it is independent of N). We can then use the monotone convergence theorem to amplify the finite sum to an infinite sum, thus

\| (\sum_{n=1}^\infty |Tf_n|^2)^{1/2} \|_{L^p(Y)} \leq C_p A \| (\sum_{n=1}^\infty |f_n|^2)^{1/2} \|_{L^p(X)}.

Comparing this to (10) we see that we have amplified a scalar inequality (in which the unknown function f takes values in the real or complex numbers) to a vector-valued inequality (in which we have a sequence f = (f_n)_{n=1}^\infty taking values in the Hilbert space l^2({\Bbb N})). [This particular amplification was first observed by Marcinkiewicz and Zygmund.]

If the estimate one is studying involves “localised” operators or norms, then one can use linearity to amplify a global estimate into a more localised one. For instance, let us return to the Sobolev inequality (8). We can establish a partition of unity 1 = \sum_{n \in {\Bbb Z}^d} \psi(x-n) for some bump function \psi, then we see that

\| f\|_{L^q({\Bbb R}^d)} \leq C_{d,q,\psi} (\sum_{n \in {\Bbb Z}^d} \| \psi(\cdot-n) f \|_{L^q({\Bbb R}^d)}^q)^{1/q}.

Applying the Sobolev inequality (8) to each localised function \psi(\cdot-n) f and then summing up, one obtains the localised Sobolev inequality

\|f\|_{L^q({\Bbb R}^d)} \leq C'_{p,q,d} (\sum_{n \in {\Bbb Z}^d} (\|f\|_{L^p(Q_n)} + \| \nabla f \|_{L^p(Q_n)})^q)^{1/q},

where Q_n is the cube of sidelength 1 centred at n. This estimate is a little stronger than (8), because the l^q summation norm is smaller than the l^p summation norm.

— Amplification via translation invariance —

If T is a translation-invariant operator on {\Bbb R}^n which is not identically zero, one can automatically rule out a large variety of estimates concerning T due to their incompatibility with translation invariance (they would amplify themselves into an absurd estimate). For instance, it will not be possible to establish any weighted estimate involving power weights such as (1+|x|)^\alpha in which there is a higher exponent on the left. More precisely if \alpha > \beta are real numbers and 0 < p,q < \infty, then it is not possible for any estimate of the form

\| (1+|x|)^{\alpha} Tf \|_{L^q({\Bbb R}^n)} \leq C_{p,q,\alpha,\beta,n} \|(1+|x|)^\beta f\|_{L^p({\Bbb R}^n)}

to be true. Indeed, if such an estimate was true, then by using the translation invariance we can amplify the above estimate to

\| (1+|x-x_0|)^\alpha Tf \|_{L^q({\Bbb R}^n)} \leq C_{p,q,\alpha,\beta,n} \|(1+|x-x_0|)^\beta f\|_{L^p({\Bbb R}^n)}

for any x_0 \in {\Bbb R}^n. But if one fixes f and lets x_0 go to infinity, we see that the right-hand side grows like |x_0|^\beta while the left-hand side grows like |x_0|^\alpha (unless Tf vanishes entirely), leading to a contradiction.

[There is a Fourier dual to the above principle, familiar to experts in the analysis of PDEs, which asserts that a function space norm with a low number of derivatives (i.e. a low-regularity norm) cannot control a norm with a high number of derivatives. Here, the underlying symmetry that drives this principle is modulation invariance rather than translation invariance.]

One can obtain particularly powerful amplifications by combining translation-invariance with linearity, because one can now consider not just translates f(x-x_0) of a single function f, but also consider superpositions \sum_{n=1}^N c_n f(x-x_n) of such functions. For instance, we have the principle (which I believe was first articulated by Littlewood) that a non-trivial translation-invariant linear operator T can only map L^p({\Bbb R}^d) to L^q({\Bbb R}^d) when q \geq p. (Littlewood summarised this principle as “the higher exponents are always on the left”.) To see this, suppose that we had an estimate of the form

\| Tf \|_{L^q({\Bbb R}^d)} \leq A \|f\|_{L^p({\Bbb R}^d)}. (11)

We can amplify this estimate by replacing f(x) by
\sum_{n=1}^N f(x-x_n), where N is some integer and x_1,\ldots,x_N are widely separated points. If these points are sufficiently far apart, then the RHS of (11) is comparable to A N^{1/p} \|f\|_{L^p({\Bbb R}^d)}, whereas the LHS is comparable to
N^{1/q} \| Tf \|_{L^q({\Bbb R}^d)} (note how this uses both the translation-invariance and linearity of T). Thus in the limit we obtain

N^{1/q} \| Tf \|_{L^q({\Bbb R}^d)} \leq A N^{1/p} \|f\|_{L^p({\Bbb R}^d)}.

Letting N go to infinity, we obtain a contradiction unless q \geq p (or unless T is identically zero).

The combination of translation invariance and linearity is so strong that it can amplify even a very qualitative estimate into a quantitative one. A good example of this is Stein’s maximal principle. Suppose we have some maximal operator M f := \sup_n |T_n f| on some compact group G with normalised Haar measure dm, where the T_n are a sequence of translation-invariant operators which are uniformly bounded on some L^p(G) space for some 1 < p \leq 2. Suppose we are given the very weak information that Mf is finite almost everywhere for every f \in L^p(G). (This is for instance the case if we know that T_n f converge pointwise almost everywhere.) Miraculously, this qualitative hypothesis can be amplified into a much stronger quantitative one, namely that M is of weak type (p,p):

m(\{ x \in G: Mf(x) \geq \lambda \}) \leq\frac{C}{\lambda^p} \|f\|_{L^p(G)}^p. (12)

To see this, suppose for contradiction that (12) failed for any C; by homogeneity, it would also fail even when restricted to the case \lambda = 1. What this means is that for any \delta > 0, there exists f_\delta such that

\| f_\delta \|_{L^p(G)}^p < \delta m( E_\delta ), (13)

where E_\delta is the set where Mf_\delta > 1.

At present, E_\epsilon could be a very small subset of G, although we know that it has positive measure. But we can amplify this set to be very large by the following trick: pick an integer N comparable to 1/m(E_\delta), select N random shifts g_1,\ldots,g_N \in G and random signs \epsilon_1,\ldots,\epsilon_N \in \{-1,+1\} and replace f_\delta by the randomised sum F_\delta := \sum_{n=1}^N \epsilon_n f_\delta(g_n^{-1} \cdot). This sum will tend to be large (greater than or comparable to 1) on most of the union \bigcup_{n=1}^N g_n \cdot E_\epsilon; this can be made precise using Khintchine’s inequality. On the other hand, another application of Khintchine’s inequality using (13) shows that F_\delta has an L^p norm of O(\delta^{1/p}) on the average. Thus we have constructed functions f of arbitrarily small L^p norm whose maximal function Mf is bounded away from zero on a set of measure bounded away from zero. From this and some minor additional tricks it is not difficult to then construct a function f in L^p whose maximal function is infinite on a set of positive measure, leading to the desired contradiction.

— The tensor power trick —

We now turn to a particularly cute source of amplification, namely the tensor power operation f \mapsto f^{\otimes M} which takes a complex-valued function f: X \to {\Bbb C} on some set X and replaces it with a tensor power f^{\otimes M}: X^M \to {\Bbb C} defined by

f^{\otimes M}(x_1,\ldots,x_M) := f(x_1) \ldots f(x_M).

If one has an estimate for which only one of the sides behaves nicely under tensor powers, then there can be some opportunity for arbitrage. For instance, suppose we wanted to prove the Hausdorff-Young inequality

\| \hat f \|_{l^{p'}(\hat G)} \leq \|f\|_{L^p(G)} (14)

on arbitrary finite additive groups G and all 1 \leq p \leq 2, where p' = p/(p-1) is the dual exponent of p, \hat G is the Pontryagin dual of G (i.e. the group of characters on G), we give G normalised counting measure, and \hat f(\chi) := \frac{1}{|G|} \sum_{x \in G} f(x) \overline{\chi(x)} is the Fourier transform on G. If we had the Riesz-Thorin interpolation theorem, we could quickly deduce (14) from the trivial inequality

\| \hat f \|_{l^\infty(\hat G)} \leq \|f\|_{L^1(G)} (15)

and the Plancherel identity

\| \hat f \|_{l^2(\hat G)} \leq \|f\|_{L^2(G)} (16);

indeed, this is one of the textbook applications of that theorem. But suppose for some reason one did not wish to use the Riesz-Thorin theorem (perhaps in a desire to avoid “non-elementary” methods, such as complex analysis), and instead wished to use the more elementary Marcinkiewicz interpolation theorem. Then, at first glance, it appears that one can only conclude the weaker estimate

\| \hat f \|_{l^{p'}(\hat G)} \leq C_p \|f\|_{L^p(G)}

for some constant C_p > 1. However, we can exploit the fact that the Fourier transform commutes with tensor powers. Indeed, by applying the above inequality with f replaced by f^{\otimes M} (and G replaced by G^M) we see that

\| \hat f \|_{l^{p'}(\hat G)}^M \leq C_p \|f\|_{L^p(G)}^M

for every M \geq 1; taking M^{th} roots and then letting M go to infinity we obtain (14); the tensor power trick has “magically” deleted the constant C_p from the inequality. More generally, one can use the tensor power trick to deduce the Riesz-Thorin interpolation theorem from the Marcinkiewicz interpolation theorem (the key point being that the (L^p,L^q) operator norm of a tensor power T^{\otimes M} of a linear operator T is just the M^{th} power of the operator norm of the original operator T). This gives a proof of the Riesz-Thorin theorem that does not require complex analysis.

Actually, the tensor power trick does not just make constants disappear; it can also get rid of logarithms. Because of this, we can make the above argument even more elementary by using a very crude form of the Marcinkiewicz interpolation argument. Indeed, suppose that f is a quasi-step function, or more precisely that it is supported on some set E in G and takes values between A and 2A for some A > 0. Then from (15) and (16) we see that \|\hat f\|_{l^\infty(\hat G)} = O( A |E|/|G| ) and \|\hat f\|_{l^2(\hat G)} = O( A (|E|/|G|)^{1/2} ), and hence \|\hat f\|_{l^{p'}(\hat G)} =O( A (|E|/|G|)^{1/p} ). Now if f is not a quasi-step function, one can decompose it into O(1+\log |G|) such functions by the “wedding cake” decomposition (dividing the range of |f| into dyadic intervals from \|f\|_{L^\infty} to \|f\|_{L^\infty}/|G|^{100}; the portion of |f| which is less than \|f\|_{L^\infty}/|G|^{100} can be easily dealt with by crude methods). From the triangle inequality we then conclude the weak Hausdorff-Young inequality

\| \hat f \|_{l^{p'}(\hat G)} \leq C_p (1+\log |G|) \|f\|_{L^p(G)}.

If one runs the tensor power trick again, one can eliminate both the constant factor C_p and the logarithmic factor 1 + \log |G| and recover (14) (basically because M^{1/M} converges to 1 as M goes to infinity). [More generally, the tensor power trick can convert restricted or weak-type estimates into strong-type estimates.]

The deletion of the constant C_p may seem minor, but there are some things one can do with a sharp estimate that one cannot with a non-sharp one. For instance, by differentiating (14) at p=2 (where equality holds) one can obtain the entropy uncertainty principle

\sum_{\chi \in \hat G} |\hat f(\xi)|^2 \log \frac{1}{|\hat f(\xi)|^2} + \frac{1}{|G|} \sum_{x \in G} |f(x)|^2 \log \frac{1}{|f(x)|^2} \geq 2 \log |G|

whenever we have the normalisation \|f\|_{L^2(G)}=1. (More generally, estimates involving Shannon entropy tend to be rather amenable to the tensor power trick.)

[I should remark that in Euclidean space, the constant in Hausdorff-Young can be improved to below 1, but this requires some particularly Euclidean devices, such as the use of Gaussians, although this is not too dissimilar as there are certainly many connections between Gaussians and tensor products (cf. the central limit theorem). All of the above discussion also has an analogue for Young’s inequality.]

The tensor power trick also allows one to disprove certain estimates. Observe that if two functions f, g on a finite additive group G such that |f(x)| \leq g(x) for all x (i.e. g majorises f), then from Plancherel’s identity we have

\| \hat f \|_{l^2(\hat G)} \leq \| \hat g \|_{l^2(\hat G)}

and more generally (by using the fact that the Fourier transform intertwines convolution and multiplication) that

\| \hat f \|_{l^p(\hat G)} \leq \| \hat g \|_{l^p(\hat G)}

for all even integers p = 2, 4, 6, \ldots. Hardy and Littlewood conjectured that a similar bound held for all 2 \leq p < \infty, thus

\| \hat f \|_{l^p(\hat G)} \leq C_p \| \hat g \|_{l^p(\hat G)}.

But if such a bound held, then by the tensor power trick one could delete the constant C_p. But then a direct computation (for instance, inspecting what happens when f is infinitesimally close to g) shows that this amplified estimate fails, and so the Hardy-Littlewood majorant conjecture is false. (With a little more work, one can then transfer this failure from finite abelian groups G to other groups, such as the unit circle {\Bbb R}/{\Bbb Z} or cyclic groups {\Bbb Z}/N{\Bbb Z}, which do not obviously admit tensor product structure; this was first done by Bachelis, and with stronger quantitative estimates by Mockenhaupt-Schlag and by Green-Ruzsa.)

The tensor product trick is also widely used in additive combinatorics (I myself learned this trick from a survey paper of Ruzsa). Here, one deals with sets A rather than functions f, but the idea is still the same: replace A by the Cartesian power A^M, see what estimate one gets, and let M \to \infty. There are many instances of this trick in the literature, but I’ll just describe one representative one, due to Ruzsa. An important inequality of Plünnecke asserts, among other things, that for finite non-empty sets A, B of an additive group G, and any positive integer k, the iterated sumset kB = B + \ldots + B obeys the bound

|kB| \leq \frac{|A+B|^k}{|A|^{k-1}}. (17)

(This inequality, incidentally, is itself proven using a version of the tensor power trick, in conjunction with Hall’s marriage theorem, but never mind that here.) This inequality can be amplified to the more general inequality

|B_1 + \ldots + B_k| \leq \frac{|A+B_1| \ldots |A+B_k|}{|A|^{k-1}}

via the tensor power trick as follows. Applying (17) with B := B_1 \cup \ldots \cup B_k, we obtain

|B_1 + \ldots + B_k| \leq \frac{(|A+B_1| + \ldots + |A+B_k|)^k}{|A|^{k-1}}.

The right-hand side looks a bit too big, but this is the same problem we encountered with the Cauchy-Schwarz or Holder inequalities, and we can resolve it in a similar way (i.e. by arbitraging homogeneity). If we replace G with the larger group G \times {\Bbb Z}^k and replace each set B_i with the larger set B_i \times \{ e_i, 2e_i, \ldots, N_i e_i \}, where e_1,\ldots,e_k is the standard basis for {\Bbb Z}^k and N_i are arbitrary positive integers (and replacing A with A \times \{0\}), we obtain

N_1 \ldots N_k |B_1 + \ldots + B_k| \leq \frac{(N_1 |A+B_1| + \ldots + N_k |A+B_k|)^k}{|A|^{k-1}}.

Optimising this in N_1,\ldots,N_k (basically, by making the N_i |A+B_i| close to constant; this is a general rule in optimisation, namely that to optimise X+Y it makes sense to make X and Y comparable in magnitude) we obtain the amplified estimate

|B_1 + \ldots + B_k| \leq C_k \frac{|A+B_1| \ldots |A+B_k|}{|A|^{k-1}}.

for some constant C_k; but then if one replaces A, B_1, \ldots, B_k with their Cartesian powers A^M, B_1^M, \ldots, B_k^M, takes M^{th} roots, and then sends M to infinity, we can delete the constant C_k and recover the inequality.

[Update, Sep 5: Optimal value for \lambda in the proof of Cauchy-Schwarz fixed. (Thanks to furia_kucha for the correction.)]

[Update, Sep 10: Proof of Ruzsa’s inequality fixed. (Thanks to Van Vu for the correction.)]

[Update, Sep 11: Depolarisation example added, and then corrected. (Thanks to Andy Cotton-Clay for the correction.)]