You are currently browsing the tag archive for the ‘sample size’ tag.

The U.S. presidential election is now only a few weeks away.  The politics of this election are of course interesting and important, but I do not want to discuss these topics here (there is not exactly a shortage of other venues for such a discussion), and would request that readers refrain from doing so in the comments to this post.  However, I thought it would be apropos to talk about some of the basic mathematics underlying electoral polling, and specifically to explain the fact, which can be highly unintuitive to those not well versed in statistics, that polls can be accurate even when sampling only a tiny fraction of the entire population.

Take for instance a nationwide poll of U.S. voters on which presidential candidate they intend to vote for.  A typical poll will ask a number $n$ of randomly selected voters for their opinion; a typical value here is $n = 1000$.  In contrast, the total voting-eligible population of the U.S. – let’s call this set $X$ – is about 200 million.  (The actual turnout in the election is likely to be closer to 100 million, but let’s ignore this fact for the sake of discussion.)  Thus, such a poll would sample about 0.0005% of the total population $X$ – an incredibly tiny fraction.  Nevertheless, the margin of error (at the 95% confidence level) for such a poll, if conducted under idealised conditions (see below), is about 3%.  In other words, if we let $p$ denote the proportion of the entire population $X$ that will vote for a given candidate $A$, and let $\overline{p}$ denote the proportion of the polled voters that will vote for $A$, then the event $\overline{p}-0.03 \leq p \leq \overline{p}+0.03$ will occur with probability at least 0.95.  Thus, for instance (and oversimplifying a little – see below), if the poll reports that 55% of respondents would vote for A, then the true percentage of the electorate that would vote for A has at least a 95% chance of lying between 52% and 58%.  Larger polls will of course give a smaller margin of error; for instance the margin of error for an (idealised) poll of 2,000 voters is about 2%.

I’ll give a rigorous proof of a weaker version of the above statement (giving a margin of error of about 7%, rather than 3%) in an appendix at the end of this post.  But the main point of my post here is a little different, namely to address the common misconception that the accuracy of a poll is a function of the relative sample size rather than the absolute sample size, which would suggest that a poll involving only 0.0005% of the population could not possibly have a margin of error as low as 3%.  I also want to point out some limitations of the mathematical analysis; depending on the methodology and the context, some polls involving 1000 respondents may have a much higher margin of error than the idealised rate of 3%.