In Section 5.2, we determined theoretical probabilities from a sample space and verified the values using basic rules of probability. In this section, we will calculate empirical probabilities from a contingency table, once again verifying those values using any rules that apply.
A contingency table is a classification of the individuals in a sample or a population according to two categorical variables. A contingency table is also called a two-way table, because it represents a two-way classification of the data.
A study published in the journal Sleep investigated the relationship between nurses’ working long hours and problems related to being drowsy while driving. The number of hours in each shift was recorded, along with whether a motor vehicle accident (MVA) or near-miss occurred. Table 5.2 is adapted from data collected for the study.
Hours worked in shift | MVA/Near-Miss Occurred | MVA/Near-Miss Did Not Occur |
---|---|---|
≤ 8.5 hours | 20 | 107 |
Between 8.5 and 12.5 hours | 96 | 334 |
≥ 12.5 hours | 166 | 364 |
We can consider each row and each column a single event; for example, the third row of the table gives the number outcomes for nurses working 12.5 hours or more. The six boxes containing the numbers of occurrences or non-occurrences are called cells of the table; these six cells are called the body of the table. Each of these cells shows the outcomes associated with the two events indicated by the cell’s row and column. Thus, 166 represents the number of outcomes associated with both a nurse working at least 12.5 hours and an MVA or near miss occurring.
Recall our definition of the probability of an event E as
\(P(E) = \frac{number \; of \; successful \; outcomes}{total \; number \; of \; outcomes} \)
whenever that the outcomes are equally likely.
To calculate event probabilities associated with this table, we need to find the total number of outcomes. We add another row and another column to Table 5.2 to record the totals. This row and this column are called the margins of the table and appear in Table 5.3.
Hours worked in shift | MVA/Near-Miss Occurred | MVA/Near-Miss Did Not Occur | Total |
---|---|---|---|
≤ 8.5 hours | 20 | 107 | 127 |
Between 8.5 and 12.5 hours | 96 | 334 | 430 |
≥ 12.5 hours | 166 | 364 | 530 |
Total | 282 | 805 | 1087 |
Let A = “a nurse worked at least 12.5 hours” and B = “an MVA or near-miss occurred.” Then P(A) = 530/1087, and P(B) = 282/1087. Notice that P(not B) can be calculated two ways—directly as 825/1087 or as 1 – P(B) = 1 – 282/1087.
Suppose that we wish to calculate the probabilities associated with two different events. The simplest of these situations involves two events happening at the same time. Using the drowsy-driving example above, we might be interested in finding P(A and B), the probability that a nurse worked at least 12.5 hours and that an MVA or near-miss occurred. To find the number of successful outcomes, we just need the single number that appears in the “≥ 12.5 hours” row and the “MVA/ Near-miss Occurred” column. This one cell represents the outcomes that the two events have in common. So P(A and B) = 166/1087.
Now consider P(A or B). Recall that we use an inclusive “or” in these settings, so we are interested in the outcomes when either nurses worked at least 12.5 hours or an MVA or near-miss occurred or both. The “≥ 12.5 hours” row gives the total outcomes for that event alone, and the “MVA or near-miss occurred” column gives the total outcomes for that event alone. If we add the numbers of these outcomes, we are counting the outcomes in common twice, which overestimates the probability. Using the rule P(A or B) = P(A) + P(B) – P(A and B), we have P(A or B) = 530/1087 + 282/1087 – 166/1087 = 646/1087.
The contingency table below gives the distribution of college foreign language degrees by level and language. Complete the table, and use it to find P(A), P(B), P(not A), P(A and B), and P(A or B) for A = the person earned a degree in German and B = the person earned a master’s degree.
Language | Bachelor's Degree | Master's Degree | Doctor's Degree | Total |
---|---|---|---|---|
French | 2,291 | 348 | 75 | /mXSvrfi9zWawA44/8LtUw== |
German | 1,097 | 188 | 77 | fSBHU1BZYn6eILuAB2x5Ww== |
Spanish | 7,613 | 791 | 190 | 5Gfcw3RnB0iagXjKfXhASQ== |
Total | FLLLa9Te1zqLD66JU2Jo/w== | A/tAd72YCZBV6qJdVdieVw== | d7pGkSW9qWQ= | k7SiP2BX+mgtn5bNKutLsg== |
P(A) = TzS+qrWBzsg=/12670
P(B) = gdI2AbNgF+g=/12670
P(not B) = 1 - gdI2AbNgF+g=/12670 = EgBUhH6HDL/Pi8Uh/12670
P(A and B) = FeXDx54HlYI=/12670
P(A or B) = TzS+qrWBzsg=/12670 + gdI2AbNgF+g=/12670 - FeXDx54HlYI=/12670 = IHKEHn8iNWc=/12670
Sometimes contingency tables give relative frequencies (as either decimals or percents) rather than the actual frequencies for each cell of the table. Let’s return to our original table, Table 5.2, and convert the outcomes in the contingency table to decimals by dividing the number of successful outcomes in each cell by the total number of outcomes (1087).
Hours worked in shift | MVA/Near-Miss Occurred | MVA/Near-Miss Did Not Occur | Total |
---|---|---|---|
≤ 8.5 hours | 0.02 | 0.10 | 0.12 |
Between 8.5 and 12.5 hours | 0.09 | 0.31 | 0.40 |
≥ 12.5 hours | 0.15 | 0.33 | 0.48 |
Total | 0.26 | 0.74 | 1.00 |
Notice that by doing this we have converted Table 5.2 to one giving certain probabilities directly—no calculation required. Recalling that event A is “a nurse worked at least 12.5 hours” and event B is “an MVA or near-miss occurred,” then P(A) = 0.48, P(B) = 0.26 and P(A and B) = 0.15. These values are the two-decimal-place approximations of the fraction values given above (with a slight variation in P(A) due to rounding).
We can use our probability rules to determine P(not A) = 1 – 0.48 = 0.52, and P(A or B) = 0.48 + 0.74 – 0.15 = 0.59, once again finding values that agree (except for rounding) with those calculated above.
We have been using the data on nurses’ shift length and motor vehicle incidents to practice finding various probabilities, but the authors of the study have a research question in mind. They are interested in seeing if working longer shifts is related to motor vehicle incidents.
How could we use probability to investigate this question? We can translate this question into two related ones. First, if a person has a motor vehicle accident, is there a higher probability that the person worked a longer shift? Second, if a person works longer shifts, is there a higher probability of having a motor vehicle accident? Answering yes to these questions would suggest a relationship between longer shifts and motor vehicle incidents.
Questions such as these involve conditional probability, the probability that one event occurs given that a second one has occurred. To investigate the relationship between shifts of 12.5 hours or more and motor vehicle incidents, we start by asking “What is the probability that a person worked 12.5 hours or more, given that the person has a motor vehicle incident?” We use a vertical bar to indicate “given,” so we write the desired probability as P(A|B).
The “given” here means that we know that the person had an MVA or near miss. We are only interested in those outcomes. The total number of outcomes associated with having an MVA or near miss is 282. Of these outcomes, the successful outcomes are those in which a person worked 12.5 hours or more, and there are 166 of them. Thus, P(A|B) = 166/282 = .59. If a person had a motor vehicle incident, the probability is 0.59 that he or she worked 12.5 or more hours. This suggests to us that there is a relationship between shifts of 12.5 hours or more and motor vehicle incidents.
We draw this conclusion because, if there were no relationship between these events, we would expect the probabilities to be roughly equal for each of the different shift lengths, about ⅓ for each one. Later on in this course, we will perform a statistical test to verify our conclusion. In the meantime, it is important to remember that an association between two variables or two events does not mean that one causes the other. It requires a controlled experiment to establish a cause-and-effect relationship.
What about the probability of having a motor vehicle incident if the nurse worked at least 12.5 hours? While order does not matter when we are calculating P(A and B) and P(A or B), it does when we are determining conditional probability. P(A|B) is the probability that A occurs if we know that B has occurred. On the other hand, P(B|A) is the probability that B occurs if A has occurred and, in general, P(A|B) does not have the same value as P(B|A).
For the example above, there are 530 outcomes in which a person worked 12.5 hours or more. Of these outcomes, there are 166 in which an MVA or near miss occurred. So P(B|A) = 166/530 = .31, a value quite different from P(A|B). How does this value compare to the probability of having a motor vehicle incident if a shorter shift is worked? P(incident | ≤ 8.5 hours) = 20/127 = .16, and P(incident | Between 8.5 and 12.5 hours) = 96/430 = .22. So we see that the longer the shift, the higher the probability of an MVA or near miss. These probabilities again suggest that there is a relationship between the length of the shift and motor vehicle incidents.
Use the contingency table below to find P(A|B) and P(B|A) for A = the person earned a degree in German and B = the person earned a master's degree.
Language | Bachelor's Degree | Master's Degree | Doctor's Degree | Total |
---|---|---|---|---|
French | 2,291 | 348 | 75 | 2,714 |
German | 1,097 | 188 | 77 | 1,362 |
Spanish | 7,613 | 791 | 190 | 8,594 |
Total | 11,001 | 1,327 | 342 | 12,670 |
(a) P(A|B) = 188/gdI2AbNgF+g=
(b) P(B|A) = 188/TzS+qrWBzsg=
In Chapter 1, we presented a table giving a snapshot of data for those aboard the ill-fated Titanic. The phrase “women and children first” is commonly used to indicate that in emergency situations, women and children should receive preference in rescue efforts. Did this happen when the Titanic sank? Table 5.5 classifies the passengers and crew according to survival and whether they were men, women, or children.
Survived | Died | Total | |
---|---|---|---|
Men | 338 | 1,352 | 1,690 |
Women | 316 | 109 | 425 |
Children | 57 | 52 | 109 |
Total | 711 | 1,513 | 2,224 |
It is clear that we cannot just examine the numbers of men, women, and children who survived to determine whether women and children were first into the lifeboats. More men than either women or children survived, but there were many more men on board.
Instead we will consider the conditional probability of surviving according to whether the person was a man, woman, or child:
P(Survived | Man) = 338/1690 = 0.20;
P(Survived | Woman) = 316/425 = 0.74;
P(Survived | Child) = 57/109 = 0.52.
So we see that if a person were a man, the probability that he survived was only 0.20, as compared with 0.74 for women and 0.52 for children. It appears that women and children indeed “went first.”
The movie Titanic portrayed third-class passengers being trapped in the ship, unable to make their way to the lifeboats. Was there also a relationship between class and survival? Did first-class passengers have a higher probability of survival than third-class passengers? What about second-class passengers? How did crew members fare, considering that they should have been the last in the lifeboats? You can explore the relationship between class and survival in the Try This! below.
The accompanying contingency table classifies those on the Titanic according to class and survival.
The accompanying contingency table classifies those on the Titanic according to class and survival.
Survived | Died | Total | |
---|---|---|---|
First Class | 203 | 122 | 325 |
Second Class | 118 | 167 | 285 |
Third Class | 178 | 528 | 706 |
Crew | 212 | 696 | 908 |
Total | 711 | 1,513 | 2,224 |
In this chapter, we have seen that probability can be used to investigate games of chance, research questions, and even historical events. We will look further at probability in the next two chapters, extending the basic ideas we have developed here. As we continue through the course, we will use probability as a tool in our analysis of sample data.