EXAMPLE 12 Constructing a frequency distribution for continuous data
carbon
The U.S. Department of Energy reported the total 2011 carbon emissions emitted for a sample of 20 states, in millions of metric tons of carbon dioxide (Table 20).
State | Carbon emissions | State | Carbon emissions |
---|---|---|---|
Arizona | 93.28 | New Jersey | 117.56 |
Arkansas | 67.56 | New Mexico | 56.60 |
Colorado | 91.98 | Oklahoma | 107.92 |
Iowa | 87.42 | South Carolina | 80.21 |
Kansas | 72.36 | Tennessee | 105.73 |
Maryland | 65.80 | Virginia | 99.86 |
Massachusetts | 68.89 | Washington | 70.81 |
Minnesota | 92.69 | West Virginia | 95.97 |
Mississippi | 61.21 | Wisconsin | 98.05 |
Nebraska | 52.26 | Wyoming | 63.89 |
Construct a frequency distribution of the carbon emissions data.
Solution
Step 1 Choose the number of classes.
It is generally recommended that between 5 and 20 classes be used, with the number of classes increasing with the sample size; a small data set such as this will do just fine with 7 classes. In general, choose the number of classes to be large enough to show the variability in the data set, but not so large that many classes are nearly empty.
64
Step 2 Determine the class widths.
First, find the range of the data, that is, the difference between the largest and smallest data points. Then, divide this range by the number of classes you chose in Step 1. This gives an estimate of the class width. Here, our largest data value is 117.56 and our smallest is 52.26, giving us a range of . In Step 1, we chose 7 classes, so that our estimated class width is For convenience, we will round this to a class width of 10. It is recommended that each class have the same width.
Step 3 Find the upper and lower class limits.
Choose limits so that each data point belongs to only one class. For example, suppose we chose one class to be 50–60 and the next class to be 60–70. Then, to which class would an emissions value of exactly 60 belong? The classes should not overlap. Therefore, we define the following classes.
Note that the lower class limit of the first class, 50, is slightly below that of the smallest value in the data set, 52. Also note that the class width equals , as desired.
Step 4 Calculate the class boundaries.
The class boundary for the first two classes is . Similarly, the other class boundaries are 70, 80, 90, 100, and 110. The lower class boundary of the leftmost class is . The upper class boundary of the rightmost class is .
Step 5 Find the frequencies for each class.
Using these seven classes, we now proceed to construct the frequency and relative frequency distributions (see Table 21) for the carbon emissions data. We count the number of data values that fall into each class, and we divide each frequency by the sample size (20) to obtain the relative frequency.
Class: | Tally | Frequency | Relative frequency |
---|---|---|---|
50 to < 60 | || | 2 | |
60 to < 70 | 5 | ||
70 to < 80 | || | 2 | |
80 to < 90 | || | 2 | |
90 to < 100 | 6 | ||
100 to < 110 | || | 2 | |
110 to < 120 | | | 1 | |
Total | 20 |
The notation “50 to < 60” indicates that this class contains values from 50 (inclusive) up to but not including 60.
NOW YOU CAN DO
Exercises 17–40.