Example 12

EXAMPLE 12 Constructing a frequency distribution for continuous data

carbon

The U.S. Department of Energy reported the total 2011 carbon emissions emitted for a sample of 20 states, in millions of metric tons of carbon dioxide (Table 20).

Table 2.48: TABLE 20 Total carbon emissions (millions of metric tons of carbon dioxide) for a sample of states, 2011

State	Carbon emissions	State	Carbon emissions
Arizona	93.28	New Jersey	117.56
Arkansas	67.56	New Mexico	56.60
Colorado	91.98	Oklahoma	107.92
Iowa	87.42	South Carolina	80.21
Kansas	72.36	Tennessee	105.73
Maryland	65.80	Virginia	99.86
Massachusetts	68.89	Washington	70.81
Minnesota	92.69	West Virginia	95.97
Mississippi	61.21	Wisconsin	98.05
Nebraska	52.26	Wyoming	63.89

Construct a frequency distribution of the carbon emissions data.

Solution

Step 1 Choose the number of classes.

It is generally recommended that between 5 and 20 classes be used, with the number of classes increasing with the sample size; a small data set such as this will do just fine with 7 classes. In general, choose the number of classes to be large enough to show the variability in the data set, but not so large that many classes are nearly empty.

64
Step 2 Determine the class widths.

First, find the range of the data, that is, the difference between the largest and smallest data points. Then, divide this range by the number of classes you chose in Step 1. This gives an estimate of the class width. Here, our largest data value is 117.56 and our smallest is 52.26, giving us a range of . In Step 1, we chose 7 classes, so that our estimated class width is For convenience, we will round this to a class width of 10. It is recommended that each class have the same width.
Step 3 Find the upper and lower class limits.

Choose limits so that each data point belongs to only one class. For example, suppose we chose one class to be 50–60 and the next class to be 60–70. Then, to which class would an emissions value of exactly 60 belong? The classes should not overlap. Therefore, we define the following classes.

Note that the lower class limit of the first class, 50, is slightly below that of the smallest value in the data set, 52. Also note that the class width equals , as desired.
Step 4 Calculate the class boundaries.

The class boundary for the first two classes is . Similarly, the other class boundaries are 70, 80, 90, 100, and 110. The lower class boundary of the leftmost class is . The upper class boundary of the rightmost class is .
Step 5 Find the frequencies for each class.

Using these seven classes, we now proceed to construct the frequency and relative frequency distributions (see Table 21) for the carbon emissions data. We count the number of data values that fall into each class, and we divide each frequency by the sample size (20) to obtain the relative frequency.

Table 2.49: TABLE 21 Distributions for the carbon emissions data

Class:	Tally	Frequency
50 to < 60	\|\|	2
60 to < 70		5
70 to < 80	\|\|	2
80 to < 90	\|\|	2
90 to < 100		6
100 to < 110	\|\|	2
110 to < 120	\|	1
Total		20

The notation “50 to < 60” indicates that this class contains values from 50 (inclusive) up to but not including 60.

NOW YOU CAN DO

Exercises 17–40.