If {\lambda>0}, a Poisson random variable {{\bf Poisson}(\lambda)} with mean {\lambda} is a random variable taking values in the natural numbers with probability distribution

\displaystyle  {\bf P}( {\bf Poisson}(\lambda) = k) = e^{-\lambda} \frac{\lambda^k}{k!}.

One is often interested in bounding upper tail probabilities

\displaystyle  {\bf P}( {\bf Poisson}(\lambda) \geq \lambda(1+u))

for {u \geq 0}, or lower tail probabilities

\displaystyle  {\bf P}( {\bf Poisson}(\lambda) \leq \lambda(1+u))

for {-1 < u \leq 0}. A standard tool for this is Bennett’s inequality:

Proposition 1 (Bennett’s inequality) One has

\displaystyle  {\bf P}( {\bf Poisson}(\lambda) \geq \lambda(1+u)) \leq \exp(-\lambda h(u))

for {u \geq 0} and

\displaystyle  {\bf P}( {\bf Poisson}(\lambda) \leq \lambda(1+u)) \leq \exp(-\lambda h(u))

for {-1 < u \leq 0}, where

\displaystyle  h(u) := (1+u) \log(1+u) - u.

From the Taylor expansion {h(u) = \frac{u^2}{2} + O(u^3)} for {u=O(1)} we conclude Gaussian type tail bounds in the regime {u = o(1)} (and in particular when {u = O(1/\sqrt{\lambda})} (in the spirit of the Chernoff, Bernstein, and Hoeffding inequalities). but in the regime where {u} is large and positive one obtains a slight gain over these other classical bounds (of {\exp(- \lambda u \log u)} type, rather than {\exp(-\lambda u)}).

Proof: We use the exponential moment method. For any {t \geq 0}, we have from Markov’s inequality that

\displaystyle  {\bf P}( {\bf Poisson}(\lambda) \geq \lambda(1+u)) \leq e^{-t \lambda(1+u)} {\bf E} \exp( t {\bf Poisson}(\lambda) ).

A standard computation shows that the moment generating function of the Poisson distribution is given by

\displaystyle  \exp( t {\bf Poisson}(\lambda) ) = \exp( (e^t - 1) \lambda )

and hence

\displaystyle  {\bf P}( {\bf Poisson}(\lambda) \geq \lambda(1+u)) \leq \exp( (e^t - 1)\lambda - t \lambda(1+u) ).

For {u \geq 0}, it turns out that the right-hand side is optimized by setting {t = \log(1+u)}, in which case the right-hand side simplifies to {\exp(-\lambda h(u))}. This proves the first inequality; the second inequality is proven similarly (but now {u} and {t} are non-positive rather than non-negative). \Box

Remark 2 Bennett’s inequality also applies for (suitably normalized) sums of bounded independent random variables. In some cases there are direct comparison inequalities available to relate those variables to the Poisson case. For instance, suppose {S = X_1 + \dots + X_n} is the sum of independent Boolean variables {X_1,\dots,X_n \in \{0,1\}} of total mean {\sum_{j=1}^n {\bf E} X_j = \lambda} and with {\sup_i {\bf P}(X_i) \leq \varepsilon} for some {0 < \varepsilon < 1}. Then for any natural number {k}, we have

\displaystyle  {\bf P}(S=k) = \sum_{1 \leq i_1 < \dots < i_k \leq n} {\bf P}(X_{i_1}=1) \dots {\bf P}(X_{i_k}=1)

\displaystyle  \prod_{i \neq i_1,\dots,i_k} {\bf P}(X_i=0)

\displaystyle  \leq \frac{1}{k!} (\sum_{i=1}^n \frac{{\bf P}(X_i=1)}{{\bf P}(X_i=0)})^k \times \prod_{i=1}^n {\bf P}(X_i=0)

\displaystyle  \leq \frac{1}{k!} (\frac{\lambda}{1-\varepsilon})^k \prod_{i=1}^n \exp( - {\bf P}(X_i = 1))

\displaystyle  \leq e^{-\lambda} \frac{\lambda^k}{(1-\varepsilon)^k k!}

\displaystyle  \leq e^{\frac{\varepsilon}{1-\varepsilon} \lambda} {\bf P}( \mathbf{Poisson}(\frac{\lambda}{1-\varepsilon}) = k).

As such, for {\varepsilon} small, one can efficiently control the tail probabilities of {S} in terms of the tail probability of a Poisson random variable of mean close to {\lambda}; this is of course very closely related to the well known fact that the Poisson distribution emerges as the limit of sums of many independent boolean variables, each of which is non-zero with small probability. See this paper of Bentkus and this paper of Pinelis for some further useful (and less obvious) comparison inequalities of this type.

In this note I wanted to record the observation that one can improve the Bennett bound by a small polynomial factor once one leaves the Gaussian regime {u = O(1/\sqrt{\lambda})}, in particular gaining a factor of {1/\sqrt{\lambda}} when {u \sim 1}. This observation is not difficult and is implicitly in the literature (one can extract it for instance from the much more general results of this paper of Talagrand, and the basic idea already appears in this paper of Glynn), but I was not able to find a clean version of this statement in the literature, so I am placing it here on my blog. (But if a reader knows of a reference that basically contains the bound below, I would be happy to know of it.)

Proposition 3 (Improved Bennett’s inequality) One has

\displaystyle  {\bf P}( {\bf Poisson}(\lambda) \geq \lambda(1+u)) \ll \frac{\exp(-\lambda h(u))}{\sqrt{1 + \lambda \min(u, u^2)}}

for {u \geq 0} and

\displaystyle  {\bf P}( {\bf Poisson}(\lambda) \leq \lambda(1+u)) \ll \frac{\exp(-\lambda h(u))}{\sqrt{1 + \lambda u^2 (1+u)}}

for {-1 < u \leq 0}.

Proof: We begin with the first inequality. We may assume that {u \geq 1/\sqrt{\lambda}}, since otherwise the claim follows from the usual Bennett inequality. We expand out the left-hand side as

\displaystyle  e^{-\lambda} \sum_{k \geq \lambda(1+u)} \frac{\lambda^k}{k!}.

Observe that for {k \geq \lambda(1+u)} that

\displaystyle  \frac{\lambda^{k+1}}{(k+1)!} \leq \frac{1}{1+u} \frac{\lambda^{k}}{k!} .

Thus the sum is dominated by the first term times a geometric series {\sum_{j=0}^\infty \frac{1}{(1+u)^j} = 1 + \frac{1}{u}}. We can thus bound the left-hand side by

\displaystyle  \ll e^{-\lambda} (1 + \frac{1}{u}) \sup_{k \geq \lambda(1+u)} \frac{\lambda^k}{k!}.

By the Stirling approximation, this is

\displaystyle  \ll e^{-\lambda} (1 + \frac{1}{u}) \sup_{k \geq \lambda(1+u)} \frac{1}{\sqrt{k}} \frac{(e\lambda)^k}{k^k}.

The expression inside the supremum is decreasing in {k} for {k > \lambda}, thus we can bound it by

\displaystyle  \ll e^{-\lambda} (1 + \frac{1}{u}) \frac{1}{\sqrt{\lambda(1+u)}} \frac{(e\lambda)^{\lambda(1+u)}}{(\lambda(1+u))^{\lambda(1+u)}},

which simplifies to

\displaystyle  \ll \frac{\exp(-\lambda h(u))}{\sqrt{1 + \lambda \min(u, u^2)}}

after a routine calculation.

Now we turn to the second inequality. As before we may assume that {u \leq -1/\sqrt{\lambda}}. We first dispose of a degenerate case in which {\lambda(1+u) < 1}. Here the left-hand side is just

\displaystyle  {\bf P}( {\bf Poisson}(\lambda) = 0 ) = e^{-\lambda}

and the right-hand side is comparable to

\displaystyle  e^{-\lambda} \exp( - \lambda (1+u) \log (1+u) + \lambda(1+u) ) / \sqrt{\lambda(1+u)}.

Since {-\lambda(1+u) \log(1+u)} is negative and {0 < \lambda(1+u) < 1}, we see that the right-hand side is {\gg e^{-\lambda}}, and the estimate holds in this case.

It remains to consider the regime where {u \leq -1/\sqrt{\lambda}} and {\lambda(1+u) \geq 1}. The left-hand side expands as

\displaystyle  e^{-\lambda} \sum_{k \leq \lambda(1+u)} \frac{\lambda^k}{k!}.

The sum is dominated by the first term times a geometric series {\sum_{j=-\infty}^0 \frac{1}{(1+u)^j} = \frac{1}{|u|}}. The maximal {k} is comparable to {\lambda(1+u)}, so we can bound the left-hand side by

\displaystyle  \ll e^{-\lambda} \frac{1}{|u|} \sup_{\lambda(1+u) \ll k \leq \lambda(1+u)} \frac{\lambda^k}{k!}.

Using the Stirling approximation as before we can bound this by

\displaystyle  \ll e^{-\lambda} \frac{1}{|u|} \frac{1}{\sqrt{\lambda(1+u)}} \frac{(e\lambda)^{\lambda(1+u)}}{(\lambda(1+u))^{\lambda(1+u)}},

which simplifies to

\displaystyle  \ll \frac{\exp(-\lambda h(u))}{\sqrt{1 + \lambda u^2 (1+u)}}

after a routine calculation. \Box

The same analysis can be reversed to show that the bounds given above are basically sharp up to constants, at least when {\lambda} (and {\lambda(1+u)}) are large.