Estimating

Statistical inference draws conclusions about a population on the basis of data about a sample. One kind of conclusion answers questions like, “What percentage of employed women have a college degree?’’ or “What is the mean survival time for patients with this type of cancer?’’ These questions ask about a number (a percentage, a mean) that describes a population. Numbers that describe a population are parameters. To estimate a population parameter, choose a sample from the population and use a statistic, a number calculated from the sample, as your estimate. Here’s an example.

EXAMPLE 1 Soda consumption

In July 2015, Gallup conducted telephone interviews with a random sample of 1009 adults. The adults were at least 18 years of age and resided either in one of the 50 U.S. states or in the District of Columbia (DC). Survey respondents were asked to consider several different foods and beverages and to indicate if these were things they actively tried to include in their diet, actively tried to avoid in their diet, or didn’t think about at all. Of the 1009 adults surveyed, 616 indicated that they actively tried to avoid drinking regular soda or pop. Based on this information, what can we say about the percentage of all Americans 18 years or age or older who actively try to avoid drinking regular soda or pop?

Our population is adults at least 18 years of age or older who reside in the 50 U.S. states or the District of Columbia. The parameter is the proportion who actively tried to avoid drinking regular soda or pop in 2015. Call this unknown parameter p, for “proportion.’’ The statistic that estimates the parameter p is the sample proportion

495

A basic move in statistical inference is to use a sample statistic to estimate a population parameter. Once we have the sample in hand, we estimate that the proportion of all adult Americans who actively tried to avoid drinking regular soda or pop in 2015 is “about 61.1%’’ because the proportion in the sample was exactly 61.1%. We can only estimate that the truth about the population is “about’’ 61.1% because we know that the sample result is unlikely to be exactly the same as the true population proportion. A confidence interval makes that “about’’ precise.

95% confidence interval

A 95% confidence interval is an interval calculated from sample data by a process that is guaranteed to capture the true population parameter in 95% of all samples.

We will first march straight through to the interval for a population proportion, and then we will reflect on what we have done and generalize a bit.