EXAMPLE 12 Constructing a frequency distribution for continuous data

carbon

The U.S. Department of Energy reported the total 2011 carbon emissions emitted for a sample of 20 states, in millions of metric tons of carbon dioxide (Table 20).

Table 2.48: TABLE 20 Total carbon emissions (millions of metric tons of carbon dioxide) for a sample of states, 2011
State Carbon emissions State Carbon emissions
Arizona 93.28 New Jersey 117.56
Arkansas 67.56 New Mexico 56.60
Colorado 91.98 Oklahoma 107.92
Iowa 87.42 South Carolina 80.21
Kansas 72.36 Tennessee 105.73
Maryland 65.80 Virginia 99.86
Massachusetts 68.89 Washington 70.81
Minnesota 92.69 West Virginia 95.97
Mississippi 61.21 Wisconsin 98.05
Nebraska 52.26 Wyoming 63.89

Construct a frequency distribution of the carbon emissions data.

Solution

  • Step 1 Choose the number of classes.

    It is generally recommended that between 5 and 20 classes be used, with the number of classes increasing with the sample size; a small data set such as this will do just fine with 7 classes. In general, choose the number of classes to be large enough to show the variability in the data set, but not so large that many classes are nearly empty.

    64

  • Step 2 Determine the class widths.

    First, find the range of the data, that is, the difference between the largest and smallest data points. Then, divide this range by the number of classes you chose in Step 1. This gives an estimate of the class width. Here, our largest data value is 117.56 and our smallest is 52.26, giving us a range of . In Step 1, we chose 7 classes, so that our estimated class width is For convenience, we will round this to a class width of 10. It is recommended that each class have the same width.

  • Step 3 Find the upper and lower class limits.

    Choose limits so that each data point belongs to only one class. For example, suppose we chose one class to be 50–60 and the next class to be 60–70. Then, to which class would an emissions value of exactly 60 belong? The classes should not overlap. Therefore, we define the following classes.

    Note that the lower class limit of the first class, 50, is slightly below that of the smallest value in the data set, 52. Also note that the class width equals , as desired.

  • Step 4 Calculate the class boundaries.

    The class boundary for the first two classes is . Similarly, the other class boundaries are 70, 80, 90, 100, and 110. The lower class boundary of the leftmost class is . The upper class boundary of the rightmost class is .

  • Step 5 Find the frequencies for each class.

    Using these seven classes, we now proceed to construct the frequency and relative frequency distributions (see Table 21) for the carbon emissions data. We count the number of data values that fall into each class, and we divide each frequency by the sample size (20) to obtain the relative frequency.

Table 2.49: TABLE 21 Distributions for the carbon emissions data
Class: Tally Frequency Relative frequency
50 to < 60 || 2
60 to < 70 image 5
70 to < 80 || 2
80 to < 90 || 2
90 to < 100 image 6
100 to < 110 || 2
110 to < 120 | 1
Total 20

The notation “50 to < 60” indicates that this class contains values from 50 (inclusive) up to but not including 60.

NOW YOU CAN DO

Exercises 17–40.