CLARIFYING THE CONCEPTS
1. Explain what a contingency table is. (p. 646)
11.2.1
Tabular summary of the relationship between two categorical variables. 3. The two-sample test for the difference in proportions from Chapter 10 is for comparing proportions of two independent populations, and the test for homogeneity of proportions is for comparing proportions of independent populations.
2. Explain in your own words what is meant by a test for independence. (p. 646)
3. What is the difference between the test for homogeneity of proportions and the two-sample test for the difference in proportions from Chapter 10? (p. 651)
4. Explain how the expected frequencies are calculated without using the shortcut method. (p. 647)
PRACTICING THE TECHNIQUES
CHECK IT OUT!
To do | Check out | Topic |
---|---|---|
Exercises 5–10 | Example 6 | Calculating expected frequencies |
Exercises 11–14 | Example 7 |
test for independence: critical-value method |
Exercises 15–18 | Example 8 |
test for independence: p-value method |
Exercises 19–26 | Example 9 |
test for homogeneity of proportions |
For Exercises 5–10, the observed frequencies are provided in a contingency table of two categorical variables. Find the expected frequencies, on the assumption that the variables are independent.
5.
A1 | A2 | |
---|---|---|
B1 | 10 | 20 |
B2 | 12 | 18 |
11.2.5
A1 | A2 | Total | |
---|---|---|---|
B1 | 11 | 19 | 30 |
B2 | 11 | 19 | 30 |
Total | 22 | 38 | 60 |
6.
C1 | C2 | |
---|---|---|
D1 | 50 | 100 |
D2 | 60 | 90 |
7.
E1 | E2 | E3 | |
---|---|---|---|
F1 | 30 | 20 | 10 |
F2 | 35 | 24 | 8 |
11.2.7
E1 | E2 | E3 | Total | |
---|---|---|---|---|
F1 | 30.71 | 20.79 | 8.50 | 60 |
F2 | 34.29 | 23.21 | 9.50 | 67 |
Total | 65 | 44 | 18 | 127 |
8.
G1 | G2 | |
---|---|---|
H1 | 10 | 8 |
H2 | 8 | 10 |
H3 | 9 | 9 |
9.
I1 | I2 | I3 | |
---|---|---|---|
J1 | 100 | 90 | 105 |
J2 | 50 | 60 | 55 |
J3 | 25 | 15 | 20 |
11.2.9
I1 | I2 | I3 | Total | |
---|---|---|---|---|
J1 | 99.2788 | 93.6058 | 102.1154 | 295 |
J2 | 55.5288 | 52.3558 | 57.1154 | 165 |
J3 | 20.1923 | 19.0385 | 20.7692 | 60 |
Total | 174.9999 | 165.0001 | 180 | 520 |
10.
K1 | K2 | K3 | K4 | |
---|---|---|---|---|
L1 | 40 | 70 | 90 | 100 |
L2 | 20 | 40 | 60 | 70 |
L3 | 30 | 65 | 65 | 70 |
658
For Exercises 11–14, test whether or not the variables are independent.
11. Exercise 5, level of significance
11.2.11
(a) : Variable and Variable are independent. : Variable and Variable are dependent.
(b)
A1 | A2 | Total | |
---|---|---|---|
B1 | 11 | 19 | 30 |
B2 | 11 | 19 | 30 |
Total | 22 | 38 | 60 |
Since none of the expected frequencies is less than 1 and none of the expected frequencies is less than 5, the conditions for performing the test for independence are met. (c) 3.841. Reject if . (d) 0.2871 (e) Since is not , we do not reject . There is insufficient evidence that variable and variable are dependent.
12. Exercise 7, level of significance
13. Exercise 9, level of significance
11.2.13
(a) : Variable and Variable are independent. : Variable and Variable are dependent.
(b)
I1 | I2 | I3 | Total | |
---|---|---|---|---|
J1 | 99.2788 | 93.6058 | 102.1154 | 295 |
J2 | 55.5288 | 52.3558 | 57.1154 | 165 |
J3 | 20.1923 | 19.0385 | 20.7692 | 60 |
Total | 174.9999 | 165.0001 | 180 | 520 |
Since none of the expected frequencies is less than 1 and none of the expected frequencies is less than 5, the conditions for performing the test for independence are met. (c) 13.277. Reject if . (d) 4.000 (e) Since is not , we do not reject . There is insufficient evidence that variable and variable are dependent.
14. Exercise 9, level of significance
For Exercises 15–18, test whether or not the variables are independent.
15. Exercise 6, level of significance
11.2.15
(a) : Variable and Variable are independent. : Variable and Variable are dependent. Reject if the -value .
C1 | C2 | Total | |
---|---|---|---|
D1 | 55 | 95 | 150 |
D2 | 55 | 95 | 150 |
Total | 110 | 190 | 300 |
Since none of the expected frequencies is less than 1 and none of the expected frequencies is less than 5, the conditions for performing the test for independence are met. (b) 1.4354 (c) - (d) Since the -value is not , we do not reject . There is insufficient evidence that variable and variable are dependent.
16. Exercise 8, level of significance
17. Exercise 10, level of significance
11.2.17
(a) : Variable and Variable are independent. : Variable and Variable are dependent. Reject if -value .
K1 | K2 | K3 | K4 | Total | |
---|---|---|---|---|---|
L1 | 37.5 | 72.92 | 89.58 | 100 | 300 |
L2 | 23.75 | 46.18 | 56.74 | 63.33 | 190 |
L3 | 28.75 | 55.90 | 68.68 | 76.67 | 230 |
Total | 90 | 175 | 215 | 240 | 720 |
Since none of the expected frequencies is less than 1 and none of the expected frequencies is less than 5, the conditions for performing the test for independence are met. (b) (c) -value (d) Since the -value is not , we do not reject . There is insufficient evidence that variable and variable are dependent.
18. Exercise 10, level of significance
For Exercises 19–22, test whether or not the proportions of successes are the same for all populations.
19.
Sample 1 | Sample 2 | Sample 3 | |
---|---|---|---|
Successes | 10 | 20 | 30 |
Failures | 20 | 45 | 62 |
11.2.19
(a) . : Not all the proportions in are equal.
(b)
Sample 1 | Sample 2 | Sample 3 | Total | |
---|---|---|---|---|
Successes | 9.63 | 20.86 | 29.52 | 60.01 |
Failures | 20.37 | 44.14 | 62.48 | 126.99 |
Total | 30 | 65 | 92 | 187 |
Since none of the expected frequencies is less than 1 and none of the expected frequencies is less than 5, the conditions for performing the test for homogeneity of proportions are met. (c) 5.991. Reject if . (d) 0.0847 (e) Since is not , we do not reject . There is insufficient evidence that not all the proportions in are equal.
20.
Sample 1 | Sample 2 | Sample 3 | |
---|---|---|---|
Successes | 50 | 50 | 100 |
Failures | 200 | 210 | 425 |
21.
Sample 1 | Sample 2 | Sample 3 | Sample 4 | |
---|---|---|---|---|
Successes | 10 | 15 | 20 | 25 |
Failures | 15 | 24 | 32 | 40 |
11.2.21
(a) . : Not all the proportions in are equal.
(b)
Sample 1 | Sample 2 | Sample 3 | Sample 4 | Total | |
---|---|---|---|---|---|
Successes | 9.67 | 15.08 | 20.11 | 25.14 | 70 |
Failures | 15.33 | 23.92 | 31.89 | 39.86 | 111 |
Total | 25 | 39 | 52 | 65 | 181 |
Since none of the expected frequencies is less than 1 and none of the expected frequencies is less than 5, the conditions for performing the test for homogeneity of proportions are met. (c) 7.815. Reject if . (d) 0.0215 (e) Since is not , we do not reject . There is insufficient evidence that not all the proportions in are equal.
22.
Sample 1 | Sample 2 | Sample 3 | Sample 4 | |
---|---|---|---|---|
Successes | 100 | 150 | 200 | 250 |
Failures | 150 | 240 | 320 | 400 |
For Exercises 23–26, test whether or not the proportions of successes are the same for all populations.
23.
Sample 1 | Sample 2 | Sample 3 | |
---|---|---|---|
Successes | 30 | 60 | 90 |
Failures | 10 | 25 | 50 |
11.2.23
(a) . : Not all the proportions in are equal. Reject if the -value .
Sample 1 | Sample 2 | Sample 3 | Total | |
---|---|---|---|---|
Successes | 27.17 | 57.74 | 95.09 | 180 |
Failures | 12.83 | 27.26 | 44.91 | 85 |
Total | 40 | 85 | 140 | 265 |
Since none of the expected frequencies is less than 1 and none of the expected frequencies is less than 5, the conditions for performing the test for homogeneity of proportions are met. (b) 2.0468 (c) -value . (d) Since the -value is not , we do not reject . There is insufficient evidence that not all the proportions in are equal.
24.
Sample 1 | Sample 2 | Sample 3 | |
---|---|---|---|
Successes | 100 | 120 | 140 |
Failures | 20 | 25 | 30 |
25.
Sample 1 | Sample 2 | Sample 3 | Sample 4 | |
---|---|---|---|---|
Successes | 10 | 12 | 24 | 32 |
Failures | 6 | 10 | 15 | 30 |
11.2.25
(a) . : Not all the proportions in are equal. Reject if the -value .
Sample 1 | Sample 2 | Sample 3 | Sample 4 | Total | |
---|---|---|---|---|---|
Successes | 8.98 | 12.35 | 21.88 | 34.79 | 78 |
Failures | 7.02 | 9.65 | 17.12 | 27.21 | 61 |
Total | 16 | 22 | 39 | 62 | 139 |
Since none of the expected frequencies is less than 1 and none of the expected frequencies is less than 5, the conditions for performing the test for homogeneity of proportions are met. (b) 1.263 (c) -value .
(d) Since the -value is not , we do not reject . There is insufficient evidence that not all the proportions in are equal.
26.
Sample 1 | Sample 2 | Sample 3 | Sample 4 | |
---|---|---|---|---|
Successes | 100 | 200 | 300 | 400 |
Failures | 30 | 70 | 150 | 300 |
APPLYING THE CONCEPTS
worktask
27. Email, Phone, or in Person? What is the most effective way to handle a task at work: by email, by phone, or in person? Well, you probably say, it depends on the task. The Pew Internet and American Life Project Email at Work Survey surveyed 1000 randomly selected work email users, who chose the following methods as the best for handling certain work tasks. Test whether the proportions who favor email differ between the two tasks, using level of significance and the p-value method.
Task | By email | By phone or in person |
---|---|---|
Edit or review documents | 670 | 330 |
Arrange meetings or appointments |
630 | 370 |
11.2.27
. : Not all the proportions in are equal. Reject if -value . Since none of the expected frequencies is less than 1 and none of the expected frequencies is less than 5, the conditions for performing the test for homogeneity of proportions are met. . -value . Since the -value is not , we do not reject . There is insufficient evidence that the proportions who favor email differ between the two tasks.
computerweight
28. Computer Usage and Weight in Children. The National Center for Health Statistics conducted a survey of children 12–15 years old. Three random samples were taken, one sample of normal or underweight children, one sample of overweight children, and one sample of obese children. The surveys noted whether the children used a computer for more than two hours per day. The results are presented in the following table. Test whether the population proportions of children who use the computer for more than two hours per day are the same for the three weight statuses, using level of significance .
659
Normal or underweight |
Overweight | Obese | Total | |
---|---|---|---|---|
Using computer more than two hours per day |
114 | 28 | 52 | 194 |
Using computer two hours or less per day |
355 | 96 | 121 | 572 |
Total | 469 | 124 | 173 | 766 |
weatherdeaths
29. Weather-Related Deaths. The Centers for Disease Control track the numbers of deaths due to weather-related causes. Is there is a difference in the pattern of deaths for young people and older people? The following table shows the number of deaths for three weather-related categories, for young people ages 15–24 and older people ages 75–84. Test, using level of significance , whether cause of death and age group are independent.
Age group |
Heat- related |
Cold- related |
Floods/ storms/ lightning |
Total |
---|---|---|---|---|
15–24 | 106 | 286 | 97 | 489 |
75–84 | 490 | 1010 | 53 | 1553 |
Total | 596 | 1296 | 150 | 2042 |
11.2.29
: Cause of death and age group are independent. : Cause of death and age group are dependent.
From the Minitab output above, none of these expected frequencies is less than one, and none of the expected frequencies is less than five. Therefore, the conditions for performing the test for independence are met. Reject if the . . . The is less than or equal to . Therefore, we reject . Evidence exists, at level of significance , that the variables Cause of death and age group are dependent.
30. Using Graphical Evidence. Sick of spam (unsolicited broadcast email)? Do you get more spam at your work, school, or home email address? The Pew Internet and American Life Project Email at Work Survey examined the proportion of spam in email users' work and home email accounts. Two random samples were used, one of work email and one of personal email. Using only the information in the clustered bar graph below, would you conclude that the proportion of those who report “a lot of spam” is the same for work email and personal email? Why?
31. Spam, Spam, Spam. Continue your work from the previous exercise. The following contingency table shows the actual percentages in the graph above based on samples of size 100 for each of work email and personal email. Test whether the proportions who report “a lot of spam” are the same for work email and personal email, using level of significance . Does your conclusion agree with your conjecture in the previous exercise?
None | Some | A lot | |
---|---|---|---|
Work email | 53% | 36% | 11% |
Personal email | 22% | 48% | 30% |
11.2.31
. : Not all the proportions in are equal. Reject if -value . Since none of the expected frequencies is less than 1 and none of the expected frequencies is less than 5, the conditions for performing the test for homogeneity of proportions are met. . -value ≈ 0. Since -value , we reject . There is evidence that the population proportions who report “a lot of spam” are not the same for work email and personal email. Yes.
games
32. Gender Differences in Computer/Video/Online Gaming. The Pew Internet and American Life Project collected data on the College Students Gaming Survey. Among the questions they asked 1720 randomly selected college students was “Which one of the following do you play the most: video games, computer games, or online games?” The results are summarized by gender in the following contingency table.
Video games |
Computer games |
Internet games |
|
---|---|---|---|
Male | 616 | 221 | 139 |
Female | 198 | 372 | 174 |
33. Online Dating. A Pew Internet and American Life Project study reported that the proportion of urban residents who use online dating is 13%, whereas the proportion for suburban residents is 10% and the proportion for rural residents is 9%.7 Test, using level of significance , whether differences exist among the population proportions of residents from the three categories who use online dating. Assume that each sample size was 1000. (Hint: The null hypothesis assumes that all proportions are equal.)
11.2.33
. : Not all the proportions in are equal. Reject if -value . Since none of the expected frequencies is less than 1 and none of the expected frequencies is less than 5, the conditions for performing the test for homogeneity of proportions are met. . -value . Since -value , we reject . There is evidence that the population proportions of residents from the three categories who use online dating are not all the same.
WORKING WITH LARGE DATA SETS
Use Minitab or Excel for each of Exercises 34–38.
Goals of Middle School Students. Open the Goals data set. The subjects are students in grades 4, 5, and 6, from three school districts in Michigan. The students were asked which of the following was most important to them: good grades, athletic ability, or popularity. Information about the students' age, gender, race, and grade was also gathered, as well as whether their school was in an urban, suburban, or rural setting.8
goals
34. How many observations are in the data set? How many variables?
goals
35. Comparing gender and goals.
Looking at the data, do you think that boys and girls at this age differ in what is most important to them: grades, popularity, or sports? In other words, do you think that the variables gender and goals are dependent or independent?
660
11.2.35
(a) Dependent (b) Since the -value ≈ 0, -value . Thus we reject . There is evidence that gender and goals are dependent.
goals
36. Comparing gender and grade.
goals
37. Comparing goals and school setting.
11.2.37
(a) Dependent (b) Since the -value , -value . Thus we reject . There is evidence that urb_rural and goals are dependent.
goals
38. Comparing grades and goals.
1970draft
39. 1970 Military Draft. Is there evidence that the 1970 military draft, conducted at the height of the Vietnam War, was not truly random? For this exercise, birth dates were ranked from 1 (for the first date drawn) to 366 (the last date drawn). In 1970, only those young men with birth date rankings up to 195 were eventually drafted. Because 195 of the 366 dates were “drafted,” the overall proportion of “drafted dates” is . Assuming the draft was truly random, we do not expect the proportion of “drafted dates” to vary significantly from month to month. In other words, the proportion of “drafted dates” should be about the same for each of the 12 months. We therefore define a multinomial random variable drafted, with the months as categories. The monthly counts of dates not drafted and drafted are provided here. (For example, for April, 12 dates out of 30 were chosen to be drafted.) Test whether the proportions of “drafted dates” are equal for all months, using level of significance .
Month | Dates not drafted | Dates drafted | All |
---|---|---|---|
Jan. | 17 | 14 | 31 |
Feb. | 16 | 13 | 29 |
Mar. | 21 | 10 | 31 |
Apr. | 18 | 12 | 30 |
May | 17 | 14 | 31 |
June | 16 | 14 | 30 |
July | 13 | 18 | 31 |
Aug. | 12 | 19 | 31 |
Sept. | 11 | 19 | 30 |
Oct. | 17 | 14 | 31 |
Nov. | 8 | 22 | 30 |
Dec. | 5 | 26 | 31 |
All | 171 | 195 | 366 |
11.2.39
. : Not all the proportions in are equal. Reject if -value . . -value . Since -value , we reject . There is evidence that the population proportion of “drafted dates” is not equal for all months.
1971draft
40. 1971 Military Draft. Criticism of the 1970 draft lottery led the U.S. Selective Service Bureau to focus on making sure that the 1971 draft lottery was truly random. Were their efforts successful? The results of the 1971 draft lottery are shown here (365 days). The Selective Service reports that all birth dates with a rank of 125 or less were chosen for the draft. Perform a test for homogeneity of proportions to determine whether the population proportions of “drafted dates” per month were all equal, using level of significance .
Month | Dates not drafted | Dates drafted |
---|---|---|
Jan. | 19 | 12 |
Feb. | 19 | 9 |
Mar. | 21 | 10 |
Apr. | 21 | 9 |
May | 22 | 9 |
June | 21 | 9 |
July | 19 | 12 |
Aug. | 18 | 13 |
Sept. | 23 | 7 |
Oct. | 19 | 12 |
Nov. | 16 | 14 |
Dec. | 22 | 9 |