dm_chapter

5.3 Probability from a Contingency Table

In Section 5.2, we determined theoretical probabilities from a sample space and verified the values using basic rules of probability. In this section, we will calculate empirical probabilities from a contingency table, once again verifying those values using any rules that apply.

5.3.1 Contingency Tables and Probability

A contingency table is a classification of the individuals in a sample or a population according to two categorical variables. A contingency table is also called a two-way table, because it represents a two-way classification of the data.

A study published in the journal Sleep investigated the relationship between nurses’ working long hours and problems related to being drowsy while driving. The number of hours in each shift was recorded, along with whether a motor vehicle accident (MVA) or near-miss occurred. Table 5.2 is adapted from data collected for the study.

Table 5.2 Work Hours and Driving Accidents

Hours worked in shift	MVA/Near-Miss Occurred	MVA/Near-Miss Did Not Occur
≤ 8.5 hours	20	107
Between 8.5 and 12.5 hours	96	334
≥ 12.5 hours	166	364

We can consider each row and each column a single event; for example, the third row of the table gives the number outcomes for nurses working 12.5 hours or more. The six boxes containing the numbers of occurrences or non-occurrences are called cells of the table; these six cells are called the body of the table. Each of these cells shows the outcomes associated with the two events indicated by the cell’s row and column. Thus, 166 represents the number of outcomes associated with both a nurse working at least 12.5 hours and an MVA or near miss occurring.

Recall our definition of the probability of an event E as

$P(E) = \frac{number \; of \; successful \; outcomes}{total \; number \; of \; outcomes}$

whenever that the outcomes are equally likely.

To calculate event probabilities associated with this table, we need to find the total number of outcomes. We add another row and another column to Table 5.2 to record the totals. This row and this column are called the margins of the table and appear in Table 5.3.

Table 5.3 Work Hours and Driving Accidents with Totals

Hours worked in shift	MVA/Near-Miss Occurred	MVA/Near-Miss Did Not Occur	Total
≤ 8.5 hours	20	107	127
Between 8.5 and 12.5 hours	96	334	430
≥ 12.5 hours	166	364	530
Total	282	805	1087

Let A = “a nurse worked at least 12.5 hours” and B = “an MVA or near-miss occurred.” Then P(A) = 530/1087, and P(B) = 282/1087. Notice that P(not B) can be calculated two ways—directly as 825/1087 or as 1 – P(B) = 1 – 282/1087.

5.3.2 Probability for Multiple Events

Suppose that we wish to calculate the probabilities associated with two different events. The simplest of these situations involves two events happening at the same time. Using the drowsy-driving example above, we might be interested in finding P(A and B), the probability that a nurse worked at least 12.5 hours and that an MVA or near-miss occurred. To find the number of successful outcomes, we just need the single number that appears in the “≥ 12.5 hours” row and the “MVA/ Near-miss Occurred” column. This one cell represents the outcomes that the two events have in common. So P(A and B) = 166/1087.

Now consider P(A or B). Recall that we use an inclusive “or” in these settings, so we are interested in the outcomes when either nurses worked at least 12.5 hours or an MVA or near-miss occurred or both. The “≥ 12.5 hours” row gives the total outcomes for that event alone, and the “MVA or near-miss occurred” column gives the total outcomes for that event alone. If we add the numbers of these outcomes, we are counting the outcomes in common twice, which overestimates the probability. Using the rule P(A or B) = P(A) + P(B) – P(A and B), we have P(A or B) = 530/1087 + 282/1087 – 166/1087 = 646/1087.

Now Try This 5.5

The contingency table below gives the distribution of college foreign language degrees by level and language. Complete the table, and use it to find P(A), P(B), P(not A), P(A and B), and P(A or B) for A = the person earned a degree in German and B = the person earned a master’s degree.

Language	Bachelor's Degree	Master's Degree	Doctor's Degree
French	2,291	348	75
German	1,097	188	77
Spanish	7,613	791	190
Total

P(A) = /12670

P(B) = /12670

P(not B) = 1 - /12670 = /12670

P(A and B) = /12670

P(A or B) = /12670 + /12670 - /12670 = /12670

Correct.

Incorrect.

Try again.

Sometimes contingency tables give relative frequencies (as either decimals or percents) rather than the actual frequencies for each cell of the table. Let’s return to our original table, Table 5.2, and convert the outcomes in the contingency table to decimals by dividing the number of successful outcomes in each cell by the total number of outcomes (1087).

Table 5.4 Work Hours and Driving Accidents Relative Frequencies

Hours worked in shift	MVA/Near-Miss Occurred	MVA/Near-Miss Did Not Occur	Total
≤ 8.5 hours	0.02	0.10	0.12
Between 8.5 and 12.5 hours	0.09	0.31	0.40
≥ 12.5 hours	0.15	0.33	0.48
Total	0.26	0.74	1.00

Notice that by doing this we have converted Table 5.2 to one giving certain probabilities directly—no calculation required. Recalling that event A is “a nurse worked at least 12.5 hours” and event B is “an MVA or near-miss occurred,” then P(A) = 0.48, P(B) = 0.26 and P(A and B) = 0.15. These values are the two-decimal-place approximations of the fraction values given above (with a slight variation in P(A) due to rounding).

We can use our probability rules to determine P(not A) = 1 – 0.48 = 0.52, and P(A or B) = 0.48 + 0.74 – 0.15 = 0.59, once again finding values that agree (except for rounding) with those calculated above.

5.3.3 Conditional Probability

We have been using the data on nurses’ shift length and motor vehicle incidents to practice finding various probabilities, but the authors of the study have a research question in mind. They are interested in seeing if working longer shifts is related to motor vehicle incidents.

How could we use probability to investigate this question? We can translate this question into two related ones. First, if a person has a motor vehicle accident, is there a higher probability that the person worked a longer shift? Second, if a person works longer shifts, is there a higher probability of having a motor vehicle accident? Answering yes to these questions would suggest a relationship between longer shifts and motor vehicle incidents.

Questions such as these involve conditional probability, the probability that one event occurs given that a second one has occurred. To investigate the relationship between shifts of 12.5 hours or more and motor vehicle incidents, we start by asking “What is the probability that a person worked 12.5 hours or more, given that the person has a motor vehicle incident?” We use a vertical bar to indicate “given,” so we write the desired probability as P(A|B).

The “given” here means that we know that the person had an MVA or near miss. We are only interested in those outcomes. The total number of outcomes associated with having an MVA or near miss is 282. Of these outcomes, the successful outcomes are those in which a person worked 12.5 hours or more, and there are 166 of them. Thus, P(A|B) = 166/282 = .59. If a person had a motor vehicle incident, the probability is 0.59 that he or she worked 12.5 or more hours. This suggests to us that there is a relationship between shifts of 12.5 hours or more and motor vehicle incidents.

We draw this conclusion because, if there were no relationship between these events, we would expect the probabilities to be roughly equal for each of the different shift lengths, about ⅓ for each one. Later on in this course, we will perform a statistical test to verify our conclusion. In the meantime, it is important to remember that an association between two variables or two events does not mean that one causes the other. It requires a controlled experiment to establish a cause-and-effect relationship.

What about the probability of having a motor vehicle incident if the nurse worked at least 12.5 hours? While order does not matter when we are calculating P(A and B) and P(A or B), it does when we are determining conditional probability. P(A|B) is the probability that A occurs if we know that B has occurred. On the other hand, P(B|A) is the probability that B occurs if A has occurred and, in general, P(A|B) does not have the same value as P(B|A).

For the example above, there are 530 outcomes in which a person worked 12.5 hours or more. Of these outcomes, there are 166 in which an MVA or near miss occurred. So P(B|A) = 166/530 = .31, a value quite different from P(A|B). How does this value compare to the probability of having a motor vehicle incident if a shorter shift is worked? P(incident | ≤ 8.5 hours) = 20/127 = .16, and P(incident | Between 8.5 and 12.5 hours) = 96/430 = .22. So we see that the longer the shift, the higher the probability of an MVA or near miss. These probabilities again suggest that there is a relationship between the length of the shift and motor vehicle incidents.

Now Try This 5.6

Use the contingency table below to find P(A|B) and P(B|A) for A = the person earned a degree in German and B = the person earned a master's degree.

Language	Bachelor's Degree	Master's Degree	Doctor's Degree	Total
French	2,291	348	75	2,714
German	1,097	188	77	1,362
Spanish	7,613	791	190	8,594
Total	11,001	1,327	342	12,670

(a) P(A|B) = 188/

(b) P(B|A) = 188/

Correct.

Incorrect.

Try again.

5.3.4 Conditional Probability and History

In Chapter 1, we presented a table giving a snapshot of data for those aboard the ill-fated Titanic. The phrase “women and children first” is commonly used to indicate that in emergency situations, women and children should receive preference in rescue efforts. Did this happen when the Titanic sank? Table 5.5 classifies the passengers and crew according to survival and whether they were men, women, or children.

	Survived	Died	Total
Men	338	1,352	1,690
Women	316	109	425
Children	57	52	109
Total	711	1,513	2,224

It is clear that we cannot just examine the numbers of men, women, and children who survived to determine whether women and children were first into the lifeboats. More men than either women or children survived, but there were many more men on board.

Instead we will consider the conditional probability of surviving according to whether the person was a man, woman, or child:

P(Survived | Man) = 338/1690 = 0.20;

P(Survived | Woman) = 316/425 = 0.74;

P(Survived | Child) = 57/109 = 0.52.

So we see that if a person were a man, the probability that he survived was only 0.20, as compared with 0.74 for women and 0.52 for children. It appears that women and children indeed “went first.”

The movie Titanic portrayed third-class passengers being trapped in the ship, unable to make their way to the lifeboats. Was there also a relationship between class and survival? Did first-class passengers have a higher probability of survival than third-class passengers? What about second-class passengers? How did crew members fare, considering that they should have been the last in the lifeboats? You can explore the relationship between class and survival in the Try This! below.

The accompanying contingency table classifies those on the Titanic according to class and survival.

Now Try This 5.7

The accompanying contingency table classifies those on the Titanic according to class and survival.

	Survived	Died	Total
First Class	203	122	325
Second Class	118	167	285
Third Class	178	528	706
Crew	212	696	908
Total	711	1,513	2,224

i3j29d/F2EcWXOjRJW2errNmfJEk35VvaMhOFcB7ek9UKCMJezl6Ka4Kawk1jK2xLBbpftKdovZ4Q6wiqpmAOVuxUNg3tFNLjqRHwd2RZbSfraCdatOfRVjc0KeNMbeg ipry+PEjAR/NPwYEoH10oK7nGE3hchXbRRYb095NLA9pKb3hxxBCjKyFFfW04b2BVnfBgdqy5/LvvYxR7iaaGV5gRALiejixYPbUGWcmVdjFlII+Zz6u7NcAYTm1L0RG z/c9zTQA5MPcUQ24iE+xVpWJI20jkjgLIZYdiL408t6MwRnXDh9stvmlvyisVxPRjb0Kc52mRMiZ5sLq0WBv5J1dnj0Yd/Q8uOnGTSYEnloyPq8sIKnXV2zknne0iitp Q3zb9clzJdEtZYibOePo1fmu9u2TuWxYvQNaEfiHjT50UwyJHYjZClksPVH7UvZsq0zfGu8eu4RM8S7t7kpyP4IfrCu3pWWPnpYssmVY29PlffAvfhcEzw== kdmuSQIDg0ctfVi74RLRV6x/YjUqAm3pO3H5cNPgFmdUZu2+eoIPo04yoaiH1mwWi44IBzy0MfYyMVpOkU/jsdArQ5OorWHU399V2C146kYrDyAIIJxa9sjkQRbUL5cl

Correct.
(a) P(Survived|First Class) = 203/325 =.62
(b) P(Survived|Second Class) = 118/285 = .41
(c) P(Survived|Third Class) = 178/706 = .25
(d) P(Survived|Crew) = 212/908 = .23
(e) It appears that survival is related to class, since higher class passengers were more likely to survive than third class passengers or crew.

Incorrect.
(a) P(Survived|First Class) = 203/325 =.62
(b) P(Survived|Second Class) = 118/285 = .41
(c) P(Survived|Third Class) = 178/706 = .25
(d) P(Survived|Crew) = 212/908 = .23
(e) It appears that survival is related to class, since higher class passengers were more likely to survive than third class passengers or crew.

Try again.

In this chapter, we have seen that probability can be used to investigate games of chance, research questions, and even historical events. We will look further at probability in the next two chapters, extending the basic ideas we have developed here. As we continue through the course, we will use probability as a tool in our analysis of sample data.

●

◌

▣