5.3 Probability from a Contingency Table

In Section 5.2, we determined theoretical probabilities from a sample space and verified the values using basic rules of probability. In this section, we will calculate empirical probabilities from a contingency table, once again verifying those values using any rules that apply.

5.3.1 Contingency Tables and Probability

A contingency table is a classification of the individuals in a sample or a population according to two categorical variables. A contingency table is also called a two-way table, because it represents a two-way classification of the data.

A study published in the journal Sleep investigated the relationship between nurses’ working long hours and problems related to being drowsy while driving. The number of hours in each shift was recorded, along with whether a motor vehicle accident (MVA) or near-miss occurred. Table 5.2 is adapted from data collected for the study.

Hours worked in shift MVA/Near-Miss Occurred MVA/Near-Miss Did Not Occur
≤ 8.5 hours 20 107
Between 8.5 and 12.5 hours 96 334
≥ 12.5 hours 166 364
Table 5.2: Work Hours and Driving Accidents

We can consider each row and each column a single event; for example, the third row of the table gives the number outcomes for nurses working 12.5 hours or more. The six boxes containing the numbers of occurrences or non-occurrences are called cells of the table; these six cells are called the body of the table. Each of these cells shows the outcomes associated with the two events indicated by the cell’s row and column. Thus, 166 represents the number of outcomes associated with both a nurse working at least 12.5 hours and an MVA or near miss occurring.

Recall our definition of the probability of an event E as

\(P(E) = \frac{number \; of \; successful \; outcomes}{total \; number \; of \; outcomes} \)

whenever that the outcomes are equally likely.

To calculate event probabilities associated with this table, we need to find the total number of outcomes. We add another row and another column to Table 5.2 to record the totals. This row and this column are called the margins of the table and appear in Table 5.3.

Hours worked in shift MVA/Near-Miss Occurred MVA/Near-Miss Did Not Occur Total
≤ 8.5 hours 20 107 127
Between 8.5 and 12.5 hours 96 334 430
≥ 12.5 hours 166 364 530
Total 282 805 1087
Table 5.3: Work Hours and Driving Accidents with Totals

Let A = “a nurse worked at least 12.5 hours” and B = “an MVA or near-miss occurred.” Then P(A) = 530/1087, and P(B) = 282/1087. Notice that P(not B) can be calculated two ways—directly as 825/1087 or as 1 – P(B) = 1 – 282/1087.

5.3.2 Probability for Multiple Events

Suppose that we wish to calculate the probabilities associated with two different events. The simplest of these situations involves two events happening at the same time. Using the drowsy-driving example above, we might be interested in finding P(A and B), the probability that a nurse worked at least 12.5 hours and that an MVA or near-miss occurred. To find the number of successful outcomes, we just need the single number that appears in the “≥ 12.5 hours” row and the “MVA/ Near-miss Occurred” column. This one cell represents the outcomes that the two events have in common. So P(A and B) = 166/1087.

Now consider P(A or B). Recall that we use an inclusive “or” in these settings, so we are interested in the outcomes when either nurses worked at least 12.5 hours or an MVA or near-miss occurred or both. The “≥ 12.5 hours” row gives the total outcomes for that event alone, and the “MVA or near-miss occurred” column gives the total outcomes for that event alone. If we add the numbers of these outcomes, we are counting the outcomes in common twice, which overestimates the probability. Using the rule P(A or B) = P(A) + P(B) – P(A and B), we have P(A or B) = 530/1087 + 282/1087 – 166/1087 = 646/1087.

Question 5.5

The contingency table below gives the distribution of college foreign language degrees by level and language. Complete the table, and use it to find P(A), P(B), P(not A), P(A and B), and P(A or B) for A = the person earned a degree in German and B = the person earned a master’s degree.

Language Bachelor's Degree Master's Degree Doctor's Degree Total
French 2,291 348 75 /mXSvrfi9zWawA44/8LtUw==
German 1,097 188 77 fSBHU1BZYn6eILuAB2x5Ww==
Spanish 7,613 791 190 5Gfcw3RnB0iagXjKfXhASQ==
Total FLLLa9Te1zqLD66JU2Jo/w== A/tAd72YCZBV6qJdVdieVw== d7pGkSW9qWQ= k7SiP2BX+mgtn5bNKutLsg==

P(A) = TzS+qrWBzsg=/12670

P(B) = gdI2AbNgF+g=/12670

P(not B) = 1 - gdI2AbNgF+g=/12670 = EgBUhH6HDL/Pi8Uh/12670

P(A and B) = FeXDx54HlYI=/12670

P(A or B) = TzS+qrWBzsg=/12670 + gdI2AbNgF+g=/12670 - FeXDx54HlYI=/12670 = IHKEHn8iNWc=/12670

Correct.
Incorrect.
Try again.
2

Sometimes contingency tables give relative frequencies (as either decimals or percents) rather than the actual frequencies for each cell of the table. Let’s return to our original table, Table 5.2, and convert the outcomes in the contingency table to decimals by dividing the number of successful outcomes in each cell by the total number of outcomes (1087).

Hours worked in shift MVA/Near-Miss Occurred MVA/Near-Miss Did Not Occur Total
≤ 8.5 hours 0.02 0.10 0.12
Between 8.5 and 12.5 hours 0.09 0.31 0.40
≥ 12.5 hours 0.15 0.33 0.48
Total 0.26 0.74 1.00
Table 5.4: Work Hours and Driving Accidents Relative Frequencies

Notice that by doing this we have converted Table 5.2 to one giving certain probabilities directly—no calculation required. Recalling that event A is “a nurse worked at least 12.5 hours” and event B is “an MVA or near-miss occurred,” then P(A) = 0.48, P(B) = 0.26 and P(A and B) = 0.15. These values are the two-decimal-place approximations of the fraction values given above (with a slight variation in P(A) due to rounding).

We can use our probability rules to determine P(not A) = 1 – 0.48 = 0.52, and P(A or B) = 0.48 + 0.74 – 0.15 = 0.59, once again finding values that agree (except for rounding) with those calculated above.

5.3.3 Conditional Probability

We have been using the data on nurses’ shift length and motor vehicle incidents to practice finding various probabilities, but the authors of the study have a research question in mind. They are interested in seeing if working longer shifts is related to motor vehicle incidents.

How could we use probability to investigate this question? We can translate this question into two related ones. First, if a person has a motor vehicle accident, is there a higher probability that the person worked a longer shift? Second, if a person works longer shifts, is there a higher probability of having a motor vehicle accident? Answering yes to these questions would suggest a relationship between longer shifts and motor vehicle incidents.

Questions such as these involve conditional probability, the probability that one event occurs given that a second one has occurred. To investigate the relationship between shifts of 12.5 hours or more and motor vehicle incidents, we start by asking “What is the probability that a person worked 12.5 hours or more, given that the person has a motor vehicle incident?” We use a vertical bar to indicate “given,” so we write the desired probability as P(A|B).

The “given” here means that we know that the person had an MVA or near miss. We are only interested in those outcomes. The total number of outcomes associated with having an MVA or near miss is 282. Of these outcomes, the successful outcomes are those in which a person worked 12.5 hours or more, and there are 166 of them. Thus, P(A|B) = 166/282 = .59. If a person had a motor vehicle incident, the probability is 0.59 that he or she worked 12.5 or more hours. This suggests to us that there is a relationship between shifts of 12.5 hours or more and motor vehicle incidents.

We draw this conclusion because, if there were no relationship between these events, we would expect the probabilities to be roughly equal for each of the different shift lengths, about ⅓ for each one. Later on in this course, we will perform a statistical test to verify our conclusion. In the meantime, it is important to remember that an association between two variables or two events does not mean that one causes the other. It requires a controlled experiment to establish a cause-and-effect relationship.

What about the probability of having a motor vehicle incident if the nurse worked at least 12.5 hours? While order does not matter when we are calculating P(A and B) and P(A or B), it does when we are determining conditional probability. P(A|B) is the probability that A occurs if we know that B has occurred. On the other hand, P(B|A) is the probability that B occurs if A has occurred and, in general, P(A|B) does not have the same value as P(B|A).

For the example above, there are 530 outcomes in which a person worked 12.5 hours or more. Of these outcomes, there are 166 in which an MVA or near miss occurred. So P(B|A) = 166/530 = .31, a value quite different from P(A|B). How does this value compare to the probability of having a motor vehicle incident if a shorter shift is worked? P(incident | ≤ 8.5 hours) = 20/127 = .16, and P(incident | Between 8.5 and 12.5 hours) = 96/430 = .22. So we see that the longer the shift, the higher the probability of an MVA or near miss. These probabilities again suggest that there is a relationship between the length of the shift and motor vehicle incidents.

Question 5.6

Use the contingency table below to find P(A|B) and P(B|A) for A = the person earned a degree in German and B = the person earned a master's degree.

Language Bachelor's Degree Master's Degree Doctor's Degree Total
French 2,291 348 75 2,714
German 1,097 188 77 1,362
Spanish 7,613 791 190 8,594
Total 11,001 1,327 342 12,670

(a) P(A|B) = 188/gdI2AbNgF+g=

(b) P(B|A) = 188/TzS+qrWBzsg=

Correct.
Incorrect.
Try again.
2

5.3.4 Conditional Probability and History

In Chapter 1, we presented a table giving a snapshot of data for those aboard the ill-fated Titanic. The phrase “women and children first” is commonly used to indicate that in emergency situations, women and children should receive preference in rescue efforts. Did this happen when the Titanic sank? Table 5.5 classifies the passengers and crew according to survival and whether they were men, women, or children.

Survived Died Total
Men 338 1,352 1,690
Women 316 109 425
Children 57 52 109
Total 711 1,513 2,224

It is clear that we cannot just examine the numbers of men, women, and children who survived to determine whether women and children were first into the lifeboats. More men than either women or children survived, but there were many more men on board.

Instead we will consider the conditional probability of surviving according to whether the person was a man, woman, or child:

P(Survived | Man) = 338/1690 = 0.20;

P(Survived | Woman) = 316/425 = 0.74;

P(Survived | Child) = 57/109 = 0.52.

So we see that if a person were a man, the probability that he survived was only 0.20, as compared with 0.74 for women and 0.52 for children. It appears that women and children indeed “went first.”

The movie Titanic portrayed third-class passengers being trapped in the ship, unable to make their way to the lifeboats. Was there also a relationship between class and survival? Did first-class passengers have a higher probability of survival than third-class passengers? What about second-class passengers? How did crew members fare, considering that they should have been the last in the lifeboats? You can explore the relationship between class and survival in the Try This! below.

The accompanying contingency table classifies those on the Titanic according to class and survival.

Question 5.7

The accompanying contingency table classifies those on the Titanic according to class and survival.

Survived Died Total
First Class 203 122 325
Second Class 118 167 285
Third Class 178 528 706
Crew 212 696 908
Total 711 1,513 2,224
i3j29d/F2EcWXOjRJW2errNmfJEk35VvaMhOFcB7ek9UKCMJezl6Ka4Kawk1jK2xLBbpftKdovZ4Q6wiqpmAOVuxUNg3tFNLjqRHwd2RZbSfraCdatOfRVjc0KeNMbeg ipry+PEjAR/NPwYEoH10oK7nGE3hchXbRRYb095NLA9pKb3hxxBCjKyFFfW04b2BVnfBgdqy5/LvvYxR7iaaGV5gRALiejixYPbUGWcmVdjFlII+Zz6u7NcAYTm1L0RG z/c9zTQA5MPcUQ24iE+xVpWJI20jkjgLIZYdiL408t6MwRnXDh9stvmlvyisVxPRjb0Kc52mRMiZ5sLq0WBv5J1dnj0Yd/Q8uOnGTSYEnloyPq8sIKnXV2zknne0iitp Q3zb9clzJdEtZYibOePo1fmu9u2TuWxYvQNaEfiHjT50UwyJHYjZClksPVH7UvZsq0zfGu8eu4RM8S7t7kpyP4IfrCu3pWWPnpYssmVY29PlffAvfhcEzw== kdmuSQIDg0ctfVi74RLRV6x/YjUqAm3pO3H5cNPgFmdUZu2+eoIPo04yoaiH1mwWi44IBzy0MfYyMVpOkU/jsdArQ5OorWHU399V2C146kYrDyAIIJxa9sjkQRbUL5cl
Correct.
(a) P(Survived|First Class) = 203/325 =.62
(b) P(Survived|Second Class) = 118/285 = .41
(c) P(Survived|Third Class) = 178/706 = .25
(d) P(Survived|Crew) = 212/908 = .23
(e) It appears that survival is related to class, since higher class passengers were more likely to survive than third class passengers or crew.
Incorrect.
(a) P(Survived|First Class) = 203/325 =.62
(b) P(Survived|Second Class) = 118/285 = .41
(c) P(Survived|Third Class) = 178/706 = .25
(d) P(Survived|Crew) = 212/908 = .23
(e) It appears that survival is related to class, since higher class passengers were more likely to survive than third class passengers or crew.
Try again.
2

In this chapter, we have seen that probability can be used to investigate games of chance, research questions, and even historical events. We will look further at probability in the next two chapters, extending the basic ideas we have developed here. As we continue through the course, we will use probability as a tool in our analysis of sample data.