Density curves

Figure 13.1 and 13.2 show curves used in place of histograms to picture the overall shape of a distribution of data. You can think of drawing a curve through the tops of the bars in a histogram, smoothing out the irregular ups and downs of the bars. There are two important distinctions between histograms and these curves. First, most histograms show the counts of observations in each class by the heights of their bars and, therefore, by the areas of the bars. We set up curves to show the proportion of observations in any region by areas under the curve. To do that, we choose the scale so that the total area under the curve is exactly 1. We then have a density curve. Second, a histogram is a plot of data obtained from a sample. We use this histogram to understand the actual distribution of the population from which the sample was selected. The density curve is intended to reflect the idealized shape of the population distribution.

EXAMPLE 1 Using a density curve

Figure 13.4 copies Figure 13.3, showing the histogram and the Normal density curve that describe this data set of 130 body temperatures. What proportion of the temperatures are greater than or equal to 99 degrees Fahrenheit? From the actual 130 observations, we can count that exactly 19 are greater than or equal to 99°F. So the proportion is 19/130, or 0.146. Because 99 is one of the break points between the classes in the histogram, the area of the shaded bars in Figure 13.4(a) makes up 0.146 of the total area of all the bars.

Now concentrate on the density curve drawn through the histogram. The total area under this curve is 1, and the shaded area in Figure 13.4(b) represents the proportion of observations that are greater than or equal to 99°F. This area is 0.1587. You can see that the density curve is a quite good approximation—0.1587 is close to 0.146.

296

image
Figure 13.4: Figure 13.4 A histogram and a Normal density curve, Example 1. (a) The area of the shaded bars in the histogram represents observations greater than 99°F. These make up 19 of the 130 observations. (b) The shaded area under the Normal curve represents the proportion of observations greater than 99°F. This area is 0.1587.

The area under the density curve in Example 1 is not exactly equal to the true proportion because the curve is an idealized picture of the distribution. For example, the curve is exactly symmetric, but the actual data are only approximately symmetric. Because density curves are smoothed-out idealized pictures of the overall shapes of distributions, they are most useful for describing large numbers of observations.