For Exercises 9.1 and 9.2, see page 458; for 9.3 and 9.4, see page 461; for 9.5 and 9.6, see page 462; for 9.7 and 9.8, see page 463; for 9.9 and 9.10, see page 465; for 9.11, see page 466; for 9.12 and 9.13, see page 471; for 9.14, see page 473; for 9.15, see page 474; and for 9.16, see page 475.
9.17 To tip or not to tip.
A study of tipping behaviors examined the relationship between the color of the shirt worn by the server and whether or not the customer left a tip.8 Here are the data for 418 male customers who participated in the study.
tipmale
Shirt color | ||||||
---|---|---|---|---|---|---|
Tip | Black | White | Red | Yellow | Blue | Green |
Yes | 22 | 25 | 40 | 31 | 25 | 27 |
No | 49 | 43 | 29 | 41 | 42 | 43 |
9.17
(a) Answers will vary. The percent that tip for each shirt color are Black 30.99%, White 36.76%, Red 57.97%, Yellow 43.06%, Blue 37.31%, Green 38.57%.
(b) There is no association between whether a male customer tips and shirt color of the server. There is an association between whether a male customer tips and shirt color of the server.
(c) .
(e) The data provide evidence of an association between whether or not the male customer left a tip and shirt color worn by the server. Red-shirted servers got the most tips!
9.18 To tip or not to tip: women customers.
Refer to the previous exercise. Here are the data for the 304 female customers who participated in the study.
tipfem
Shirt color | |||||||
---|---|---|---|---|---|---|---|
Tip | Black | White | Red | Yellow | Blue | Green | |
Yes | 18 | 16 | 15 | 19 | 16 | 18 | |
No | 33 | 32 | 38 | 31 | 31 | 37 |
476
Using the questions for the previous exercise as a guide, analyze these data and compare the results with those you found for the male customers.
9.19 Evaluating the price and math anxiety.
Subjects in a study were asked to arrange for the rental of two tents, each for two weeks. They were offered two options for the price: (A) $40 per day per tent with a discount of $50 per tent per week, or (B) $40 per day per tent with a discount of 20%. The subjects were classified by their level of math anxiety as Low, Moderate, or High.9 The percents of subjects choosing the higher priced option that is easier to compute (A) were 14%, 19%, and 45% for the low, medium, and high math anxiety groups, respectively. Assume that there are 60 subjects in each of these groups.
9.19
(a) Counts are 8, 11, 27, 52, 49, 33.
(b) Answers will vary. The percent that choose the higher-priced option for each math anxiety group are Low 13.33%, Moderate 18.33%, High 45%.
(c) There is no association between level of math anxiety and which rental option is chosen, There is an association between level of math anxiety and which rental option is chosen.
(d) .
(e) The data provide evidence of an association between level of math anxiety and which rental option is chosen. The higher the math anxiety, the more likely someone will be to choose the higher-priced rental option.
9.20 Brands and sex-typed portraits: Nivea.
In a study of brand personality, subjects were shown four portraits: a highly feminine female, a less feminine female, a highly masculine male, and a less masculine male. They were then asked to classify brands to one of these four sex-typed portraits.10 We use two categorical variables to describe the data. Portrait with values Female and Male specifies the sex of the model in the portrait, and Intensity with values High and Low specifies the level of femininity or masculinity. Here are the results for Nivea, one of the brands described as a highly feminine brand.
nivea
Portrait | ||
---|---|---|
Intensity | Female | Male |
High | 125 | 11 |
Low | 121 | 12 |
Analyze these data. Write a short summary of your results that includes appropriate numerical and graphical summaries. Give reasons for your selection of the summaries you use.
9.21 Brands and sex-typed portraits: Audi.
Refer to the previous exercise. Another brand studied was Audi, one of the brands described as a highly masculine brand. Here are the data.
audi
Portrait | ||
---|---|---|
Intensity | Female | Male |
High | 15 | 217 |
Low | 9 | 28 |
Analyze these data. Write a short summary of your results that includes appropriate numerical and graphical summaries. Give reasons for your selection of the summaries you use.
9.21
There is no association between intensity and portrait, There is an association between intensity and portrait. . The data provide evidence of an association between intensity and portrait. A huge percent, 91.08%, classified Audi as masculine, but additionally, a huge percent of those also classified it as high intensity, 88.57%. While of the 8.92% that classified Audi as feminine, only 62.5% of those saw it as high intensity; hence the association between portrait and intensity.
9.22 Brands and sex-typed portraits: H&M.
Refer to the previous two exercises. Another brand studied was H&M, one of the brands described as an androgynous brand. Here are the data.
handm
Portrait | ||
---|---|---|
Intensity | Female | Male |
High | 167 | 16 |
Low | 27 | 61 |
Analyze these data. Write a short summary of your results that includes appropriate numerical and graphical summaries. Give reasons for your selection of the summaries you use.
9.23 Compare the brands.
Refer to the previous three exercises. Compare the results that you found for the three brands. Be sure to indicate similarities and differences in the way that these brands are viewed.
9.23
Nivea is viewed as feminine but about equally likely to be high or low intensity. Audi is viewed masculine but, additionally, is primarily high intensity—especially so among those who also thought it was masculine. H&M, however, showed a unique characteristic with a higher percent viewing it as feminine but primarily of high intensity, while those who viewed it as masculine primarily viewed it of low intensity, a reversal of what we saw in the other group.
9.24 The value of online courses.
A Pew Internet survey asked college presidents whether or not they believed that online courses offer an equal educational value when compared with courses taken in the classroom. The presidents were classified by the type of educational institution. Here are the data.11
online
Institution type | ||||
---|---|---|---|---|
Response | Four-year private |
Four-year public |
Two-year private |
Two-year public |
Yes | 36 | 50 | 66 | 54 |
No | 62 | 48 | 34 | 45 |
9.25 Do the answers depend upon institution type?
Refer to the previous exercise. You want to examine whether or not the data provide evidence that the belief that online and classroom courses offer equal educational value varies with the type of institution of the president.
online
477
9.25
(a) There is no association between response and institution type; There is an association between response and institution type.
(b) .
(c) The data provide evidence of an association between whether or not the president believes that online courses offer an equal educational value as classroom courses and what institution type the president is from.
9.26 Compare the college presidents with the general public.
Refer to Exercise 9.24. Another Pew Internet survey asked the general public about their opinions on the value of online courses. Of the 2142 people who participated in the survey, 621 responded Yes to the question, “Do you believe that online courses offer an equal educational value when compared with courses taken in the classroom?”
online
9.27 Remote deposit capture.
The Federal Reserve has called remote deposit capture (RDC) “the most important development the [U.S.] banking industry has seen in years.” This service allows users to scan checks and to transmit the scanned images to a bank for posting.12 In its annual survey of community banks, the American Bankers Association asked banks whether or not they offered this service.13 Here are the results classified by the asset size (in millions of dollars) of the bank.
rdc
Offer RDC | ||
---|---|---|
Asset size | Yes | No |
Under $100 | 63 | 309 |
$101 to $200 | 59 | 132 |
$201 or more | 112 | 85 |
9.27
(a) The percent of banks for each asset size that offer RDC are Under $100: 16.94%, $101 to $200: 30.89%, $201 or more: 56.85%.
(b) . The data provide evidence of an association between the size of the bank and whether or not it offers RDC. Generally speaking, the small-size banks, as measured by assets, are less likely to offer RDC.
9.28 How does RDC vary across the country?
rdcreg
The survey described in the previous exercise also classified community banks by region.14 Here is the table of counts.
Offer RDC | ||
---|---|---|
Region | Yes | No |
Northeast | 28 | 38 |
Southeast | 57 | 61 |
Central | 53 | 84 |
Midwest | 63 | 181 |
Southwest | 27 | 51 |
West | 61 | 76 |
9.29 Trust and honesty in the workplace.
One of the questions in a survey of high school students asked about trust and honesty in the workplace.15 Specifically, they were asked whether they thought trust and honesty were essential in business and the workplace. Here are the counts classified by gender.
trust
Gender | ||
---|---|---|
Trust and honesty are essential | Male | Female |
Agree | 9,097 | 10,935 |
Disagree | 685 | 423 |
Note that you answered parts (a) through (c) of this exercise if you completed Exercise 2.109 (page 114).
9.29
(a) 9782, 11358; 20032, 1108.
(b) 93% of males and 96.28% of females felt trust and honesty were essential.
(c) A higher percent of females than males feel that trust and honesty were essential in business and the workplace.
(d) . The data provide evidence of an association between gender and whether or not they thought trust and honesty were essential in business and the workplace.
9.30 Lying to a teacher.
The students surveyed in the study described in the previous exercise were also asked about lying to teachers. The following table gives the numbers of students who said that they lied to a teacher at least once during the past year, classified by gender.
lying
478
Gender | ||
---|---|---|
Lied at least once | Male | Female |
Yes | 6057 | 5966 |
No | 4165 | 5719 |
Note that you answered parts (a) through (c) of this exercise if you completed Exercise 2.108 (page 114). Answer the questions given in the previous exercise for this survey question.
9.31 Nonresponse in a survey.
A business school conducted a survey of companies in its state. It mailed a questionnaire to 200 small companies, 200 medium-sized companies, and 200 large companies. The rate of nonresponse is important in deciding how reliable survey results are. Here are the data on response to this survey.
nresp
Small | Medium | Large | |
---|---|---|---|
Response | 124 | 80 | 41 |
No response | 76 | 120 | 159 |
Total | 200 | 200 | 200 |
Note that you answered parts (a) through (c) of this exercise if you completed Exercise 2.112 (page 115).
9.31
(a) 59.17%.
(b) Larger business have higher nonresponse rates. 79.5% of large, 60% of medium, and 38% of small businesses did not respond.
(d) There is no association between size of business and response rate, There is an association between size of business and response rate. . The data provide evidence of an association between size of business and response rate.
9.32 Hiring practices.
A company has been accusec of age discrimination in hiring for operator positions. Lawyers for both sides look at data on applicants for the past three years. They compare hiring rates for applicants younger than 40 years and those 40 years or older.
hiring
Age | Hired | Not hired |
---|---|---|
Younger than 40 | 82 | 1160 |
40 or older | 2 | 168 |
Note that you answered parts (a) through (d) of this exercise if you completed Exercise 2.111 (page 115).
9.33 Obesity and health.
Recent studies have shown that earlier reports underestimated the health risks associated with being overweight. The error was due to overlooking lurking variables. In particular, smoking tends both to reduce weight and to lead to earlier death. Note that you answered part (a) of this exercise if you completed Exercise 2.117 (page 116).
9.34 Discrimination?
Wabash Tech has two professional schools, business and law. Here are two-way tables of applicants to both schools, categorized by gender and admission decision. (Although these data are made up, similar situations occur in reality.)
disc
Business | ||
---|---|---|
Admit | Deny | |
Male | 480 | 120 |
Female | 180 | 20 |
Law | ||
---|---|---|
Admit | Deny | |
Male | 10 | 90 |
Female | 100 | 200 |
Note that you answered parts (a) through (d) of this exercise if you completed Exercise 2.116 (page 116).
479
9.35 What's wrong?
Explain what is wrong with each of the following:
9.35
(a) A -value cannot be negative.
(b) Expected cell counts are computed under the assumption that the null hypothesis is true, not the alternative.
(c) The alternative hypothesis should be that there is an association between two categorical variables.
9.36 Plot the test statistic and the P-values.
Here is a two-way table of counts. The two categorical variables are and , and the possible values for each of these variables are 0 and 1. Notice that the second row depends upon a quantity that we call . For this exercise, you will examine how the test statistic and its corresponding P-value depend upon this quantity. Notice that the row sums are both 100.
0 | 1 | |
---|---|---|
0 | 50 | 50 |
1 |
9.37 Plot the test statistic and the P-values.
Here is a two-way table of counts. The two categorical variables are and , and the possible values for each of these variables are 0 and 1.
counts
0 | 1 | |
---|---|---|
0 | 5 | 5 |
1 | 7 | 3 |
9.37
(a) .
(b) .
(c) For 4: .
For 6: .
For 8: .
In relation to sample size, collecting twice as much data that demonstrates the same association doubles the value and makes the data more significant. Similarly, collecting four times as much data that portray the same association quadruples the value and makes the data even more significant, etc.
9.38 Trends in broadband market.
The Pew Internet and American Life Project collects data about the impact of the Internet on various aspects of American life.16 One set of surveys has tracked the use of broadband in homes over a period of several years.17 Here are some data on the percent of homes that access the Internet using broadband:
Date of survey | 2001 | 2005 | 2009 | 2013 |
Homes with broadband | 6% | 33% | 63% | 70% |
Assume a sample size of 2250 for each survey.
480
9.39 Can dial-up compete?
Refer to the previous exercise. The same surveys provided data on access to the Internet using dial-up. Here are the data:
Date of survey | 2001 | 2005 | 2009 | 2013 |
Homes with dial-up | 41% | 28% | 7% | 3% |
9.39
(a) Dial-up: 923, 630, 158, 68. Without: 1327, 1620, 2092, 2182.
(b) There is no association between year and Internet access using dialup, There is an association between year and Internet access using dial-up. . The data provide evidence of an association between year and Internet access using dial-up.
(c) Since 2001, broadband usage has increased dramatically, from 6% in 2001 to 70% in 2013; at the same time, use of dial-up access has plummeted, going from 41% in 2001 to only 3% in 2013.
9.40 How robust are the conclusions?
Refer to Exercise 9.38 on the use of broadband to access the Internet. In that exercise, the percents were read from a graph, and we assumed that the sample size was 2250 for all the surveys. Investigate the robustness of your conclusions in Exercise 9.38 against the use of 2250 as the sample size for all surveys and to roundoff and slight errors in reading the graph. Assume that the actual sample sizes ranged from 2200 to 2600. Assume also that the percents reported are all accurate to within ±2%. In other words, if the reported percent is 33%, then we can assume that the actual survey percent is between 31% and 35%. Reanalyze the data using at least five scenarios that vary the percents and the sample sizes within the assumed ranges. Summarize your results in a report, paying particular attention to the consequences for your conclusions in Exercise 9.38.
9.41 Find the P-value.
For each of the following situations, give the degrees of freedom and an appropri ate bound on the P-value (give the exact value if you have software available) for the statistic for testing the null hypothesis of no association between the row and column variables.
9.41
(a) .
(b) .
(c) .
(d) .
9.42 Health care fraud.
Most errors in billing insurance providers for health care services involve honest mistakes by patients, physicians, or others involved in the health care system. However, fraud is a serious problem. The National Health Care Anti-fraud Association estimates that approximately tens of billions of dollars are lost to health care fraud each year.18 When fraud is suspected, an audit of randomly selected billings is often conducted. The selected claims are then reviewed by experts, and each claim is classified as allowed or not allowed. The distributions of the amounts of claims are frequently highly skewed, with a large number of small claims and small number of large claims. Simple random sampling would likely be overwhelmed by small claims and would tend to miss the large claims, so stratification is often used. See the section on stratified sampling in Chapter 3 (page 134). Here are data from an audit that used three strata based on the sizes of the claims (small, medium, and large).19
berrors
Stratum | Sampled claims | Number not allowed |
---|---|---|
Small | 57 | 6 |
Medium | 17 | 5 |
Large | 5 | 1 |
9.43 Population estimates.
Refer to the previous exercise. One reason to do an audit such as this is to estimate the number of claims that would not be allowed if all claims in a population were examined by experts. We have estimates of the proportions of such claims from each stratum based on our sample. With our simple random sampling of claims from each stratum, we have unbiased estimates of the corresponding population proportion for each stratum. Therefore, if we take the sample proportions and multiply by the population sizes, we would have the estimates that we need. Here are the population sizes for the three strata:
Stratum | Claims in strata |
---|---|
Small | 3342 |
Medium | 246 |
Large | 58 |
9.43
(a) The estimates are 352 for Small, 73 for Medium, and 12 for Large.
481
9.44 Construct a table.
Construct a table of counts where there is no apparent association between the row and column variables.
9.45 Jury selection.
Exercise 8.93 (page 453) concerns Casteneda v. Partida, the case in which the Supreme Court decision used the phrase “two or three standard deviations” as a criterion for statistical significance. There were 181,535 persons eligible for jury duty, of whom 143,611 were Mexican Americans. Of the 870 people selected for jury duty, 339 were Mexican Americans. We are interested in finding out if there is an association between being a Mexican American and being selected as a juror. Formulate this problem using a two-way table of counts. Construct the table using the variables “Mexican American or not” and “juror or not.” Find the statistic and its P-value. Square the statistic that you obtained in Exercise 8.93 and verify that the result is equal to the statistic.
9.45
with rounding error.
9.46 Students explain statistical data.
The National Survey of Student Engagement conducts surveys to study various aspects of undergraduate education.20 In a recent survey, students were asked if they needed to explain the meaning of numerical or statistical data in a written assignment. Among the first-year students, 9,697 responded positively while 13,514 seniors responded positively. A total of 13,171 first-year students and 16,997 seniors from 622 U.S. four-year colleges and universities responded to the survey.
9.47 A reduction in force.
In economic downturns or to improve their competitiveness, corporations may undertake a reduction in force (RIF), in which substantial numbers of employees are laid off. Federal and state laws require that employees be treated equally regardless of their age. In particular, employees over the age of 40 years are a “protected class.” Many allegations of discrimination focus on comparing employees over 40 with their younger coworkers. Here are the data for a recent RIF.
rif1
Over 40 | ||
---|---|---|
Released | No | Yes |
Yes | 8 | 42 |
No | 503 | 764 |
9.47
(a) 5.21% of those over 40 and 1.57% of those under 40 were laid off. Yes, it looks like a higher percentage of over 40 were laid off than under 40.
(b) . The data provide evidence of an association between age group and being laid off.
9.48 Employee performance appraisal.
A major issue that arises in RIFs like that in the previous exercise is the extent to which employees in various groups are similar. If, for example, employees over 40 receive generally lower performance ratings than younger workers, that might explain why more older employees were laid off. We have data on the last performance appraisal. The possible values are “partially meets expectations,” “fully meets expectations,” “usually exceeds expectations,” and “continually exceeds expectations.” Because there were very few employees who partially met expectations, we combine the first two categories. Here are the data.
rif2
Over 40 | ||
---|---|---|
Performance appraisal | No | Yes |
Partially or fully meets expectations | 86 | 233 |
Usually exceeds expectations | 352 | 493 |
Continually exceeds expectations | 64 | 35 |
Note that the total number of employees in this table is less than the number in the previous exercise because some employees do not have a performance appraisal. Analyze the data. Do the older employees appear to have lower performance evaluations?
9.49 Which model?
This exercise concerns the material in Section 9.1 on models for two-way tables. Look at Exercise 9.27, Exercise 9.31, Exercise 9.42, and Exercise 9.47. For each exercise, state whether you are comparing several populations based on separate samples from each population (the first model for two-way tables) or testing independence between two categorical variables based on a single sample (the second model).
9.49
9.27 and 9.47 are based on a single sample (the second model); 9.31 and 9.42 are based on separate samples (the first model).
482
9.50 Computations for RDC and bank size.
Refer to the table of data for bank asset size and remote deposit capture offering in Exercise 9.27 (page 477).
9.51 Titanic!
In 1912, the luxury liner Titanic, on its first voyage, struck an iceberg and sank. Some passengers got off the ship in lifeboats, but many died. Think of the Titanic disaster as an experiment in how the people of that time behaved when faced with death in a situation where only some can escape. The passengers are a sample from the population of their peers. Here is information about who lived and who died, by gender and economic status.21 (The data leave out a few passengers whose economic status is unknown.)
titanic
Men | ||
Status | Died | Survived |
---|---|---|
Highest | 111 | 61 |
Middle | 150 | 22 |
Lowest | 419 | 85 |
Total | 680 | 168 |
Women | ||
Status | Died | Survived |
Highest | 6 | 126 |
Middle | 13 | 90 |
Lowest | 107 | 101 |
Total | 126 | 317 |
9.51
(a) 80.19% of men and 28.44% of women died. . The data provide evidence that a higher proportion of men died than women. Answers will vary for reasons.
(b) Among the women, 4.55% of Highest, 12.62% of Middle, and 51.44% of Lowest died. . The data provide evidence of an association between death and economic status for the women.
(c) Among the men, 64.53% of Highest, 87.21% of Middle, and 83.13% of Lowest died. . The data provide evidence of an association between death and economic status for the men.
9.52 Goodness of fit to a standard Normal distribution.
Computer software generated 500 random numbers that should look as if they are from the standard Normal distribution. They are categorized into five groups: (1) less than or equal to −0.6, (2) greater than −0.6 and less than or equal to −0.1, (3) greater than −0.1 and less than or equal to 0.1, (4) greater than 0.1 and less than or equal to 0.6, and (5) greater than 0.6. The counts in the five groups are 139, 102, 41, 78, and 140, respectively. Find the probabilities for these five intervals using Table A. Then compute the expected number for each interval for a sample of 500. Finally, perform the goodness-of-fit test and summarize your results.
9.53 More on the goodness of fit to a standard Normal distribution.
Refer to the previous exercise.
9.54 Goodness of fit to the uniform distribution.
Computer software generated 500 random numbers that should look as if they are from the uniform distribution on the interval 0 to 1 (see page 213). They are categorized into five groups: (1) less than or equal to 0.2, (2) greater than 0.2 and less than or equal to 0.4, (3) greater than 0.4 and less than or equal to 0.6, (4) greater than 0.6 and less than or equal to 0.8, and (5) greater than 0.8. The counts in the five groups are 114, 92, 108, 101, and 85, respectively. The probabilities for these five intervals are all the same. What is this probability? Compute the expected number for each interval for a sample of 500. Finally, perform the goodness-of-fit test and summarize your results.
9.55 More on goodness of fit to the uniform distribution.
Refer to the previous exercise.
9.56 Suspicious results?
An instructor who assigned an exercise similar to the one described in the previous exercise received homework from a student who reported a P-value of 0.999. The instructor suspected that the student did not use the computer for the assignment but just made up some numbers for the homework. Why was the instructor suspicious? How would this scenario change if there were 2000 students in the class?