An improvement to Bennett’s inequality for the Poisson distribution

13 December, 2022 in expository, math.PR | Tags: Bennett's inequality, Poisson distribution | by Terence Tao

If ${\lambda>0}$ , a Poisson random variable ${{\bf Poisson}(\lambda)}$ with mean ${\lambda}$ is a random variable taking values in the natural numbers with probability distribution

$\displaystyle {\bf P}( {\bf Poisson}(\lambda) = k) = e^{-\lambda} \frac{\lambda^k}{k!}.$

One is often interested in bounding upper tail probabilities

$\displaystyle {\bf P}( {\bf Poisson}(\lambda) \geq \lambda(1+u))$

for ${u \geq 0}$ , or lower tail probabilities

$\displaystyle {\bf P}( {\bf Poisson}(\lambda) \leq \lambda(1+u))$

for ${-1 < u \leq 0}$ . A standard tool for this is Bennett’s inequality:

Proposition 1 (Bennett’s inequality) One has
$\displaystyle {\bf P}( {\bf Poisson}(\lambda) \geq \lambda(1+u)) \leq \exp(-\lambda h(u))$
for ${u \geq 0}$ and
$\displaystyle {\bf P}( {\bf Poisson}(\lambda) \leq \lambda(1+u)) \leq \exp(-\lambda h(u))$
for ${-1 < u \leq 0}$ , where
$\displaystyle h(u) := (1+u) \log(1+u) - u.$

From the Taylor expansion ${h(u) = \frac{u^2}{2} + O(u^3)}$ for ${u=O(1)}$ we conclude Gaussian type tail bounds in the regime ${u = o(1)}$ (and in particular when ${u = O(1/\sqrt{\lambda})}$ (in the spirit of the Chernoff, Bernstein, and Hoeffding inequalities). but in the regime where ${u}$ is large and positive one obtains a slight gain over these other classical bounds (of ${\exp(- \lambda u \log u)}$ type, rather than ${\exp(-\lambda u)}$ ).

Proof: We use the exponential moment method. For any ${t \geq 0}$ , we have from Markov’s inequality that

$\displaystyle {\bf P}( {\bf Poisson}(\lambda) \geq \lambda(1+u)) \leq e^{-t \lambda(1+u)} {\bf E} \exp( t {\bf Poisson}(\lambda) ).$

A standard computation shows that the moment generating function of the Poisson distribution is given by

$\displaystyle \exp( t {\bf Poisson}(\lambda) ) = \exp( (e^t - 1) \lambda )$

and hence

$\displaystyle {\bf P}( {\bf Poisson}(\lambda) \geq \lambda(1+u)) \leq \exp( (e^t - 1)\lambda - t \lambda(1+u) ).$

For ${u \geq 0}$ , it turns out that the right-hand side is optimized by setting ${t = \log(1+u)}$ , in which case the right-hand side simplifies to ${\exp(-\lambda h(u))}$ . This proves the first inequality; the second inequality is proven similarly (but now ${u}$ and ${t}$ are non-positive rather than non-negative). $\Box$

Remark 2 Bennett’s inequality also applies for (suitably normalized) sums of bounded independent random variables. In some cases there are direct comparison inequalities available to relate those variables to the Poisson case. For instance, suppose ${S = X_1 + \dots + X_n}$ is the sum of independent Boolean variables ${X_1,\dots,X_n \in \{0,1\}}$ of total mean ${\sum_{j=1}^n {\bf E} X_j = \lambda}$ and with ${\sup_i {\bf P}(X_i) \leq \varepsilon}$ for some ${0 < \varepsilon < 1}$ . Then for any natural number ${k}$ , we have
$\displaystyle {\bf P}(S=k) = \sum_{1 \leq i_1 < \dots < i_k \leq n} {\bf P}(X_{i_1}=1) \dots {\bf P}(X_{i_k}=1)$

$\displaystyle \prod_{i \neq i_1,\dots,i_k} {\bf P}(X_i=0)$

$\displaystyle \leq \frac{1}{k!} (\sum_{i=1}^n \frac{{\bf P}(X_i=1)}{{\bf P}(X_i=0)})^k \times \prod_{i=1}^n {\bf P}(X_i=0)$

$\displaystyle \leq \frac{1}{k!} (\frac{\lambda}{1-\varepsilon})^k \prod_{i=1}^n \exp( - {\bf P}(X_i = 1))$

$\displaystyle \leq e^{-\lambda} \frac{\lambda^k}{(1-\varepsilon)^k k!}$

$\displaystyle \leq e^{\frac{\varepsilon}{1-\varepsilon} \lambda} {\bf P}( \mathbf{Poisson}(\frac{\lambda}{1-\varepsilon}) = k).$
As such, for ${\varepsilon}$ small, one can efficiently control the tail probabilities of ${S}$ in terms of the tail probability of a Poisson random variable of mean close to ${\lambda}$ ; this is of course very closely related to the well known fact that the Poisson distribution emerges as the limit of sums of many independent boolean variables, each of which is non-zero with small probability. See this paper of Bentkus and this paper of Pinelis for some further useful (and less obvious) comparison inequalities of this type.

In this note I wanted to record the observation that one can improve the Bennett bound by a small polynomial factor once one leaves the Gaussian regime ${u = O(1/\sqrt{\lambda})}$ , in particular gaining a factor of ${1/\sqrt{\lambda}}$ when ${u \sim 1}$ . This observation is not difficult and is implicitly in the literature (one can extract it for instance from the much more general results of this paper of Talagrand, and the basic idea already appears in this paper of Glynn), but I was not able to find a clean version of this statement in the literature, so I am placing it here on my blog. (But if a reader knows of a reference that basically contains the bound below, I would be happy to know of it.)

Proposition 3 (Improved Bennett’s inequality) One has
$\displaystyle {\bf P}( {\bf Poisson}(\lambda) \geq \lambda(1+u)) \ll \frac{\exp(-\lambda h(u))}{\sqrt{1 + \lambda \min(u, u^2)}}$
for ${u \geq 0}$ and
$\displaystyle {\bf P}( {\bf Poisson}(\lambda) \leq \lambda(1+u)) \ll \frac{\exp(-\lambda h(u))}{\sqrt{1 + \lambda u^2 (1+u)}}$
for ${-1 < u \leq 0}$ .

Proof: We begin with the first inequality. We may assume that ${u \geq 1/\sqrt{\lambda}}$ , since otherwise the claim follows from the usual Bennett inequality. We expand out the left-hand side as

$\displaystyle e^{-\lambda} \sum_{k \geq \lambda(1+u)} \frac{\lambda^k}{k!}.$

Observe that for ${k \geq \lambda(1+u)}$ that

$\displaystyle \frac{\lambda^{k+1}}{(k+1)!} \leq \frac{1}{1+u} \frac{\lambda^{k}}{k!} .$

Thus the sum is dominated by the first term times a geometric series ${\sum_{j=0}^\infty \frac{1}{(1+u)^j} = 1 + \frac{1}{u}}$ . We can thus bound the left-hand side by

$\displaystyle \ll e^{-\lambda} (1 + \frac{1}{u}) \sup_{k \geq \lambda(1+u)} \frac{\lambda^k}{k!}.$

By the Stirling approximation, this is

$\displaystyle \ll e^{-\lambda} (1 + \frac{1}{u}) \sup_{k \geq \lambda(1+u)} \frac{1}{\sqrt{k}} \frac{(e\lambda)^k}{k^k}.$

The expression inside the supremum is decreasing in ${k}$ for ${k > \lambda}$ , thus we can bound it by

$\displaystyle \ll e^{-\lambda} (1 + \frac{1}{u}) \frac{1}{\sqrt{\lambda(1+u)}} \frac{(e\lambda)^{\lambda(1+u)}}{(\lambda(1+u))^{\lambda(1+u)}},$

which simplifies to

$\displaystyle \ll \frac{\exp(-\lambda h(u))}{\sqrt{1 + \lambda \min(u, u^2)}}$

after a routine calculation.

Now we turn to the second inequality. As before we may assume that ${u \leq -1/\sqrt{\lambda}}$ . We first dispose of a degenerate case in which ${\lambda(1+u) < 1}$ . Here the left-hand side is just

$\displaystyle {\bf P}( {\bf Poisson}(\lambda) = 0 ) = e^{-\lambda}$

and the right-hand side is comparable to

$\displaystyle e^{-\lambda} \exp( - \lambda (1+u) \log (1+u) + \lambda(1+u) ) / \sqrt{\lambda(1+u)}.$

Since ${-\lambda(1+u) \log(1+u)}$ is negative and ${0 < \lambda(1+u) < 1}$ , we see that the right-hand side is ${\gg e^{-\lambda}}$ , and the estimate holds in this case.

It remains to consider the regime where ${u \leq -1/\sqrt{\lambda}}$ and ${\lambda(1+u) \geq 1}$ . The left-hand side expands as

$\displaystyle e^{-\lambda} \sum_{k \leq \lambda(1+u)} \frac{\lambda^k}{k!}.$

The sum is dominated by the first term times a geometric series ${\sum_{j=-\infty}^0 \frac{1}{(1+u)^j} = \frac{1}{|u|}}$ . The maximal ${k}$ is comparable to ${\lambda(1+u)}$ , so we can bound the left-hand side by

$\displaystyle \ll e^{-\lambda} \frac{1}{|u|} \sup_{\lambda(1+u) \ll k \leq \lambda(1+u)} \frac{\lambda^k}{k!}.$

Using the Stirling approximation as before we can bound this by

$\displaystyle \ll e^{-\lambda} \frac{1}{|u|} \frac{1}{\sqrt{\lambda(1+u)}} \frac{(e\lambda)^{\lambda(1+u)}}{(\lambda(1+u))^{\lambda(1+u)}},$

which simplifies to

$\displaystyle \ll \frac{\exp(-\lambda h(u))}{\sqrt{1 + \lambda u^2 (1+u)}}$

after a routine calculation. $\Box$

The same analysis can be reversed to show that the bounds given above are basically sharp up to constants, at least when ${\lambda}$ (and ${\lambda(1+u)}$ ) are large.

13 comments

Comments feed for this article

13 December, 2022 at 11:54 am

Anonymous

Is there a known characterization of the class of distributions for which Bennett’s inequality is sharp?

14 December, 2022 at 10:37 am

Terence Tao

I’m pretty sure Bennett’s inequality is almost never going to be completely sharp – the step in the proof where Markov’s inequality is applied is always going to lose something (since the Poisson random variable – or whatever other variable one is applying the inequality to – is not going to be a step function random variable). The main advantage of the Bennett inequality is that it has a clean statement and is reasonably close to optimal, but the point is that there is still a little bit of room for improvement. (There are several other improved versions of Bennett’s inequality in the literature, though so far I have not been able to use those to obtain a proof of Proposition 3 that was shorter than the direct calculation provided here.)

15 December, 2022 at 12:56 pm

Anonymous

Is the exponent $-1/2$ of the gain factor $\lambda ^{-1/2}$ the best possible?

16 December, 2022 at 9:08 am

Terence Tao

Yes (staying in the regime $u \sim 1$ for simplicity). For instance, suppose that $\lambda$ is an integer and $u=1$ . Then

$\displaystyle {\bf P}( {\bf Poisson}(\lambda) \geq 2\lambda ) \geq {\bf P}( {\bf Poisson}(\lambda) = 2\lambda )$

$\displaystyle = e^{-\lambda} \frac{\lambda^{2\lambda}}{(2\lambda)!}$

$\displaystyle \asymp e^{-\lambda} \frac{\lambda^{2\lambda} e^{2\lambda}}{(2\lambda)^{2\lambda} \sqrt{2\lambda}}$

$\displaystyle \asymp \exp(-\lambda h(1))/\sqrt{\lambda}$

by the Stirling approximation.

13 December, 2022 at 7:39 pm

Xiaohui Chen

Dear Prof. Tao,

I am not sure if there is an exact reference containing your sharpened bound than the standard Bennett’s inequality. However, your bound reminds me the connection to the standard Gaussian tail probability bound $\Phi(x) \leq \phi(x) / x$ , which leverages an integration by parts argument that can be generalized to the Poisson distribution.

Let $X$ be a ${\bf Poisson}(\lambda)$ random variable, my key observation is to note that we can write

$\displaystyle P(X=k) - P(X=k-1) = P(X = k) (1 - k / \lambda) < 0$

for any $k \geq \lambda (1+u)$ .

Using summation by parts (where the Poisson tail goes to zero sub-exponentially fast), we can write the tail probability as

$\displaystyle P(X \geq \lambda (1+u)) = \sum_{k \geq \lambda (1+u)} [P(X = k) - P(X = k-1)] / (1 - k / \lambda)$

$\displaystyle = u^{-1} P(X = \lambda (1+u) - 1) + \sum_{k \geq \lambda (1+u) + 1} P(X = k-1) [(k/\lambda - 1)^{-1} - ((k-1)/\lambda - 1)^{-1}].$

Note that the second term on the RHS of the last expression is negative, so we can drop it to yield the following bound:

$\displaystyle P(X \geq \lambda (1+u)) \leq u^{-1} P(X = \lambda (1+u) - 1).$

Using the above relation again, we have

$\displaystyle P(X = \lambda (1+u) - 1) = (1 + u) P(X = \lambda (1+u)).$

Then,

$\displaystyle P(X \geq \lambda (1+u)) \leq (1 + u^{-1}) P(X = \lambda (1+u)).$

Now, running the same Stirling's approximation as in your argument to simplify the RHS of the last inequality, we obtain your upper bound in Proposition 3.

Similar lower bound can be deduced with this argument.

I think we can run the summation of parts argument to get even higher-order refinement of the standard Bennett's inequality with improvement of a factor as O(poly(\lambda)) in non-Gaussian regime.

Thus, I believe your bounds can be interpreted as a discrete version of the Mills ratio in the standard Gaussian distribution case.

14 December, 2022 at 7:05 am

Upper bounds on left and right tails of Poisson probability

[…] Terence Tao published a blog post on bounds for the Poisson probability distribution. Specifically, he wrote about Bennett’s […]

14 December, 2022 at 7:57 am

Anonymous

Tao’s ego is too big. Stop doing self-promotion and start doing relevant mathematics.

22 December, 2022 at 3:52 am

Bert Zwart

Thanks for this. I would like to point towards other bounds for the Poisson distribution, capitalizing upon the connection with the Lambert W function. We also noticed a connection with a conjecture of Ramanujan (settled by Flajolet in 1995). https://www.cambridge.org/core/journals/advances-in-applied-probability/article/gaussian-expansions-and-bounds-for-the-poisson-distribution-applied-to-the-erlang-b-formula/76DB4F08E5A5DE90D85A90E9D0788DA7

22 December, 2022 at 4:09 pm

Terence Tao

Thanks for the reference! Somehow we did not locate it during our own literature search. In principle these sharp asymptotics should recover the cruder upper bounds in Proposition 3, although actually doing the calculations for this seem to be about as complicated as the proof of Proposition 3 already provided in the blog post…

12 January, 2023 at 11:31 pm

Alexander

It seems that something similar can be also done for the Binomial distribution https://ieeexplore.ieee.org/abstract/document/9546799

5 February, 2023 at 8:47 am

Arman

Prof you’re amazing just wanted let you know)

19 February, 2023 at 2:36 pm

Theo

Hi Terry, thanks for the post; it’s very useful. I believe the RHS above “Thus the sum is dominated…” should be \frac{1}{1+u}\frac{\lambda^k}{k!} instead of \frac{1}{1+u}\frac{\lambda^{k+1}}{(k+1)!}. Thanks again!

[Corrected, thanks – T.]

16 June, 2023 at 9:39 am

Terence Tao

Update: I have just learned (from Kevin Ford) that these improved Bennett inequalities essentially appear in Section 4 of this paper of Norton.

	Anonymous on Work hard
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Erratum for “An inverse…
	Anonymous on Erratum for “An inverse…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Erratum for “An inverse…

An improvement to Bennett’s inequality for the Poisson distribution

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

13 comments

Leave a comment Cancel reply

For commenters

An improvement to Bennett’s inequality for the Poisson distribution

Share this:

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

13 comments

Leave a comment Cancel reply

For commenters