CHAPTER 9 Review Exercises

For Exercises 9.1 and 9.2, see page 458; for 9.3 and 9.4, see page 461; for 9.5 and 9.6, see page 462; for 9.7 and 9.8, see page 463; for 9.9 and 9.10, see page 465; for 9.11, see page 466; for 9.12 and 9.13, see page 471; for 9.14, see page 473; for 9.15, see page 474; and for 9.16, see page 475.

Question 9.17

9.17 To tip or not to tip.

A study of tipping behaviors examined the relationship between the color of the shirt worn by the server and whether or not the customer left a tip.8 Here are the data for 418 male customers who participated in the study.

tipmale

Shirt color
Tip Black White Red Yellow Blue Green
Yes 22 25 40 31 25 27
No 49 43 29 41 42 43
  1. Use numerical summaries to describe the data. Give a justification for the summaries that you choose.
  2. State appropriate null and alternative hypotheses for this setting.
  3. Give the results of the significance test for these data. Be sure to include the test statistic, the degrees of freedom, and the P-value.
  4. Make a mosaic plot if you have the needed software.
  5. Write a short summary of what you have found including your conclusion.

9.17

(a) Answers will vary. The percent that tip for each shirt color are Black 30.99%, White 36.76%, Red 57.97%, Yellow 43.06%, Blue 37.31%, Green 38.57%.
(b) There is no association between whether a male customer tips and shirt color of the server. There is an association between whether a male customer tips and shirt color of the server.
(c) .
(e) The data provide evidence of an association between whether or not the male customer left a tip and shirt color worn by the server. Red-shirted servers got the most tips!

Question 9.18

9.18 To tip or not to tip: women customers.

Refer to the previous exercise. Here are the data for the 304 female customers who participated in the study.

tipfem

Shirt color
Tip Black White Red Yellow Blue Green
Yes 18 16 15 19 16 18
No 33 32 38 31 31 37

476

Using the questions for the previous exercise as a guide, analyze these data and compare the results with those you found for the male customers.

Question 9.19

9.19 Evaluating the price and math anxiety.

Subjects in a study were asked to arrange for the rental of two tents, each for two weeks. They were offered two options for the price: (A) $40 per day per tent with a discount of $50 per tent per week, or (B) $40 per day per tent with a discount of 20%. The subjects were classified by their level of math anxiety as Low, Moderate, or High.9 The percents of subjects choosing the higher priced option that is easier to compute (A) were 14%, 19%, and 45% for the low, medium, and high math anxiety groups, respectively. Assume that there are 60 subjects in each of these groups.

  1. Give the two-way table of counts for this study.
  2. Use numerical summaries to describe the data. Give a justification for the summaries that you choose.
  3. State appropriate null and alternative hypotheses for this setting.
  4. Give the results of the significance test for these data. Be sure to include the test statistic, the degrees of freedom, and the P-value.
  5. Write a short summary of what you have found, including your conclusion.

9.19

(a) Counts are 8, 11, 27, 52, 49, 33.
(b) Answers will vary. The percent that choose the higher-priced option for each math anxiety group are Low 13.33%, Moderate 18.33%, High 45%.
(c) There is no association between level of math anxiety and which rental option is chosen, There is an association between level of math anxiety and which rental option is chosen.
(d) .
(e) The data provide evidence of an association between level of math anxiety and which rental option is chosen. The higher the math anxiety, the more likely someone will be to choose the higher-priced rental option.

Question 9.20

9.20 Brands and sex-typed portraits: Nivea.

In a study of brand personality, subjects were shown four portraits: a highly feminine female, a less feminine female, a highly masculine male, and a less masculine male. They were then asked to classify brands to one of these four sex-typed portraits.10 We use two categorical variables to describe the data. Portrait with values Female and Male specifies the sex of the model in the portrait, and Intensity with values High and Low specifies the level of femininity or masculinity. Here are the results for Nivea, one of the brands described as a highly feminine brand.

nivea

Portrait
Intensity Female Male
High 125 11
Low 121 12

Analyze these data. Write a short summary of your results that includes appropriate numerical and graphical summaries. Give reasons for your selection of the summaries you use.

Question 9.21

9.21 Brands and sex-typed portraits: Audi.

Refer to the previous exercise. Another brand studied was Audi, one of the brands described as a highly masculine brand. Here are the data.

audi

Portrait
Intensity Female Male
High 15 217
Low 9 28

Analyze these data. Write a short summary of your results that includes appropriate numerical and graphical summaries. Give reasons for your selection of the summaries you use.

9.21

There is no association between intensity and portrait, There is an association between intensity and portrait. . The data provide evidence of an association between intensity and portrait. A huge percent, 91.08%, classified Audi as masculine, but additionally, a huge percent of those also classified it as high intensity, 88.57%. While of the 8.92% that classified Audi as feminine, only 62.5% of those saw it as high intensity; hence the association between portrait and intensity.

Question 9.22

9.22 Brands and sex-typed portraits: H&M.

Refer to the previous two exercises. Another brand studied was H&M, one of the brands described as an androgynous brand. Here are the data.

handm

Portrait
Intensity Female Male
High 167 16
Low 27 61

Analyze these data. Write a short summary of your results that includes appropriate numerical and graphical summaries. Give reasons for your selection of the summaries you use.

Question 9.23

9.23 Compare the brands.

Refer to the previous three exercises. Compare the results that you found for the three brands. Be sure to indicate similarities and differences in the way that these brands are viewed.

9.23

Nivea is viewed as feminine but about equally likely to be high or low intensity. Audi is viewed masculine but, additionally, is primarily high intensity—especially so among those who also thought it was masculine. H&M, however, showed a unique characteristic with a higher percent viewing it as feminine but primarily of high intensity, while those who viewed it as masculine primarily viewed it of low intensity, a reversal of what we saw in the other group.

Question 9.24

9.24 The value of online courses.

A Pew Internet survey asked college presidents whether or not they believed that online courses offer an equal educational value when compared with courses taken in the classroom. The presidents were classified by the type of educational institution. Here are the data.11

online

Institution type
Response Four-year
private
Four-year
public
Two-year
private
Two-year
public
Yes 36 50 66 54
No 62 48 34 45
  1. Discuss different ways to plot the data. Choose one way to make a plot and give reasons for your choice.
  2. Make the plot and describe what it shows.

Question 9.25

9.25 Do the answers depend upon institution type?

Refer to the previous exercise. You want to examine whether or not the data provide evidence that the belief that online and classroom courses offer equal educational value varies with the type of institution of the president.

online

477

  1. Formulate this question in terms of appropriate null and alternative hypotheses.
  2. Perform the significance test. Report the test statistic, the degrees of freedom, and the P-value.
  3. Write a short summary explaining the results.

9.25

(a) There is no association between response and institution type; There is an association between response and institution type.
(b) .
(c) The data provide evidence of an association between whether or not the president believes that online courses offer an equal educational value as classroom courses and what institution type the president is from.

Question 9.26

9.26 Compare the college presidents with the general public.

Refer to Exercise 9.24. Another Pew Internet survey asked the general public about their opinions on the value of online courses. Of the 2142 people who participated in the survey, 621 responded Yes to the question, “Do you believe that online courses offer an equal educational value when compared with courses taken in the classroom?”

online

  1. Use the data given in Exercise 9.24 to find the number of college presidents who responded Yes to the question.
  2. Construct a two-way table that you can use to compare the responses of the general public with the responses of the college presidents.
  3. Is it meaningful to interpret the marginal totals or percents for this table? Explain your answer.
  4. Analyze the data in your two-way table, and summarize the results.

Question 9.27

9.27 Remote deposit capture.

The Federal Reserve has called remote deposit capture (RDC) “the most important development the [U.S.] banking industry has seen in years.” This service allows users to scan checks and to transmit the scanned images to a bank for posting.12 In its annual survey of community banks, the American Bankers Association asked banks whether or not they offered this service.13 Here are the results classified by the asset size (in millions of dollars) of the bank.

rdc

Offer RDC
Asset size Yes No
Under $100 63 309
$101 to $200 59 132
$201 or more 112 85
  1. Summarize the results of this survey question numerically and graphically. [In Exercise 2.102 (page 113), you were asked to do this.]
  2. Test the null hypothesis that there is no association between the size of a bank, measured by assets, and whether or not it offers RDC. Report the test statistic, the P-value, and your conclusion.

9.27

(a) The percent of banks for each asset size that offer RDC are Under $100: 16.94%, $101 to $200: 30.89%, $201 or more: 56.85%.
(b) . The data provide evidence of an association between the size of the bank and whether or not it offers RDC. Generally speaking, the small-size banks, as measured by assets, are less likely to offer RDC.

Question 9.28

9.28 How does RDC vary across the country?

rdcreg

The survey described in the previous exercise also classified community banks by region.14 Here is the table of counts. image

Offer RDC
Region Yes No
Northeast 28 38
Southeast 57 61
Central 53 84
Midwest 63 181
Southwest 27 51
West 61 76
  1. Summarize the results of this survey question numerically and graphically. [In Exercise 2.103 (page 113), you were asked to do this.]
  2. Test the null hypothesis that there is no association between region and whether or not a community bank offers RDC. Report the test statistic with the degrees of freedom.
  3. Report the P-value and make a sketch similar to the one on page 464 to illustrate the calculation.
  4. Write a summary of your analysis and conclusion. Be sure to include numerical and graphical summaries.

Question 9.29

9.29 Trust and honesty in the workplace.

One of the questions in a survey of high school students asked about trust and honesty in the workplace.15 Specifically, they were asked whether they thought trust and honesty were essential in business and the workplace. Here are the counts classified by gender.

trust

Gender
Trust and honesty are essential Male Female
Agree 9,097 10,935
Disagree 685 423

Note that you answered parts (a) through (c) of this exercise if you completed Exercise 2.109 (page 114).

  1. Add the marginal totals to the table.
  2. Calculate appropriate percents to describe the results of this question.
  3. Summarize your findings in a short paragraph.
  4. Test the null hypothesis that there is no association between gender and lying to teachers. Give the test statistic and the P-value (with a sketch similar to the one on page 464) and summarize your conclusion. Be sure to include numerical and graphical summaries.

9.29

(a) 9782, 11358; 20032, 1108.
(b) 93% of males and 96.28% of females felt trust and honesty were essential.
(c) A higher percent of females than males feel that trust and honesty were essential in business and the workplace.
(d) . The data provide evidence of an association between gender and whether or not they thought trust and honesty were essential in business and the workplace.

Question 9.30

9.30 Lying to a teacher.

The students surveyed in the study described in the previous exercise were also asked about lying to teachers. The following table gives the numbers of students who said that they lied to a teacher at least once during the past year, classified by gender.

lying

478

Gender
Lied at least once Male Female
Yes 6057 5966
No 4165 5719

Note that you answered parts (a) through (c) of this exercise if you completed Exercise 2.108 (page 114). Answer the questions given in the previous exercise for this survey question.

Question 9.31

9.31 Nonresponse in a survey.

A business school conducted a survey of companies in its state. It mailed a questionnaire to 200 small companies, 200 medium-sized companies, and 200 large companies. The rate of nonresponse is important in deciding how reliable survey results are. Here are the data on response to this survey.

nresp

Small Medium Large
Response 124 80 41
No response 76 120 159
Total 200 200 200

Note that you answered parts (a) through (c) of this exercise if you completed Exercise 2.112 (page 115).

  1. What was the overall percent of nonresponse?
  2. Describe how nonresponse is related to the size of the business. (Use percents to make your statements precise.)
  3. Draw a bar graph to compare the nonresponse percents for the three size categories.
  4. State and test an appropriate null hypothesis for th data.

9.31

(a) 59.17%.
(b) Larger business have higher nonresponse rates. 79.5% of large, 60% of medium, and 38% of small businesses did not respond.
(d) There is no association between size of business and response rate, There is an association between size of business and response rate. . The data provide evidence of an association between size of business and response rate.

Question 9.32

9.32 Hiring practices.

A company has been accusec of age discrimination in hiring for operator positions. Lawyers for both sides look at data on applicants for the past three years. They compare hiring rates for applicants younger than 40 years and those 40 years or older.

hiring

Age Hired Not hired
Younger than 40 82 1160
40 or older 2 168

Note that you answered parts (a) through (d) of this exercise if you completed Exercise 2.111 (page 115).

  1. Find the two conditional distributions of hired/not hired: one for applicants who are less than 40 years old and one for applicants who are not less than 40 years old.
  2. Based on your calculations, make a graph to show the differences in distribution for the two age categories.
  3. Describe the company's hiring record in words. Does the company appear to discriminate on the basis of age?
  4. What lurking variables might be involved here?
  5. Use a significance test to determine whether or not the data indicate that there is a relationship between age and whether or not an applicant is hired.

Question 9.33

9.33 Obesity and health.

Recent studies have shown that earlier reports underestimated the health risks associated with being overweight. The error was due to overlooking lurking variables. In particular, smoking tends both to reduce weight and to lead to earlier death. Note that you answered part (a) of this exercise if you completed Exercise 2.117 (page 116).

  1. Illustrate Simpson's paradox by a simplified version of this situation. That is, make up tables of overweight (yes or no) by early death (yes or no) by smoker (yes or no) such that
    • Overweight smokers and overweight nonsmokers both tend to die earlier than those not overweight.
    • But when smokers and nonsmokers are combined into a two-way table of overweight by early death, persons who are not overweight tend to die earlier.
  2. Perform significance tests for the combined data set and for the smokers and nonsmokers separately. If all P-values are not less than 0.05, redo your tables so that all results are statistically significant at this level.

Question 9.34

9.34 Discrimination?

Wabash Tech has two professional schools, business and law. Here are two-way tables of applicants to both schools, categorized by gender and admission decision. (Although these data are made up, similar situations occur in reality.)

disc

Business
Admit Deny
Male 480 120
Female 180 20
Law
Admit Deny
Male 10 90
Female 100 200

Note that you answered parts (a) through (d) of this exercise if you completed Exercise 2.116 (page 116).

  1. Make a two-way table of gender by admission decision for the two professional schools together by summing entries in these tables.
  2. From the two-way table, calculate the percent of male applicants who are admitted and the percent of female applicants who are admitted. Wabash admits a higher percent of male applicants.

    479

  3. Now compute separately the percents of male and female applicants admitted by the business school and by the law school. Each school admits a higher percent of female applicants.
  4. This is Simpson's paradox: both schools admit a higher percent of the women who apply, but overall Wabash admits a lower percent of female applicants than of male applicants. Explain carefully, as if speaking to a skeptical reporter, how it can happen that Wabash appears to favor males when each school individually favors females.
  5. Use the data summary that you prepared in part (a) to test the null hypothesis that there is no relationship between gender and whether or not an applicant is admitted to a professional school at Wabash Tech.
  6. Test the same null hypothesis using the business school data only.
  7. Do the same for the law school data.
  8. Compare the results for the two schools.

Question 9.35

9.35 What's wrong?

Explain what is wrong with each of the following:

  1. The P-value for a chi-square significance test was .
  2. Expected cell counts are computed under the assumption that the alternative hypothesis is true.
  3. A chi-square test was used to test the alternative hypothesis that there is no association between two categorical variables.

9.35

(a) A -value cannot be negative.
(b) Expected cell counts are computed under the assumption that the null hypothesis is true, not the alternative.
(c) The alternative hypothesis should be that there is an association between two categorical variables.

Question 9.36

9.36 Plot the test statistic and the P-values.

Here is a two-way table of counts. The two categorical variables are and , and the possible values for each of these variables are 0 and 1. Notice that the second row depends upon a quantity that we call . For this exercise, you will examine how the test statistic and its corresponding P-value depend upon this quantity. Notice that the row sums are both 100.

0 1
0 50 50
1
  1. Consider setting equal to zero. Find the percent of zeros for the variable when . Do the same for the case where . With this choice of , the data match the null hypothesis as closely as possible. Explain why.
  2. Consider the tables where the values of are equal to 0, 5, 10, 15, 20, and 25. For each of these scenarios, find the percent of zeros for when . Notice that this percent does not vary with for .
  3. Compute the test statistic and P-value for testing the null hypothesis that there is no association between the row and column variables for each of the values of given in part (b).
  4. Plot the values of the test statistic versus the percent of zeros for when . Do the same for the P-values. Summarize what you have learned from this exercise in a short paragraph.

Question 9.37

9.37 Plot the test statistic and the P-values.

Here is a two-way table of counts. The two categorical variables are and , and the possible values for each of these variables are 0 and 1.

counts

0 1
0 5 5
1 7 3
  1. Find the percent of zeros for when . Do the same for the case where . Find the value of the test statistic and its P-value.
  2. Now multiply all of the counts in the table by 2. Verify that the percent of zeros for when and the percent of zeros for the when do not change. Find the value of the test statistic and its P-value for this table.
  3. Answer part (b) for tables where all counts are multiplied by 4, 6, and 8. Summarize all your results graphically, and write a short paragraph describing what you have learned from this exercise.

9.37

(a) .
(b) .
(c) For 4: .
For 6: .
For 8: .
In relation to sample size, collecting twice as much data that demonstrates the same association doubles the value and makes the data more significant. Similarly, collecting four times as much data that portray the same association quadruples the value and makes the data even more significant, etc.

Question 9.38

9.38 Trends in broadband market.

The Pew Internet and American Life Project collects data about the impact of the Internet on various aspects of American life.16 One set of surveys has tracked the use of broadband in homes over a period of several years.17 Here are some data on the percent of homes that access the Internet using broadband:

Date of survey 2001 2005 2009 2013
Homes with broadband 6% 33% 63% 70%

Assume a sample size of 2250 for each survey.

  1. Display the data in a two-way table of counts.
  2. Test the null hypothesis that the proportion of homes that access the Internet using broadband has not changed over this period of time. Report your test statistic with degrees of freedom and the P-value. What do you conclude?

480

Question 9.39

9.39 Can dial-up compete?

Refer to the previous exercise. The same surveys provided data on access to the Internet using dial-up. Here are the data:

Date of survey 2001 2005 2009 2013
Homes with dial-up 41% 28% 7% 3%
  • (a) to (c) Answer the questions given in the previous exercise for these data.
  • (d) Write a short report summarizing the changes in broadband access that have occurred over this period of time using your analysis from this exercise and the previous one. Include a graph with information about both broadband and dial-up access over time.

9.39

(a) Dial-up: 923, 630, 158, 68. Without: 1327, 1620, 2092, 2182.
(b) There is no association between year and Internet access using dialup, There is an association between year and Internet access using dial-up. . The data provide evidence of an association between year and Internet access using dial-up.
(c) Since 2001, broadband usage has increased dramatically, from 6% in 2001 to 70% in 2013; at the same time, use of dial-up access has plummeted, going from 41% in 2001 to only 3% in 2013.

Question 9.40

9.40 How robust are the conclusions?

Refer to Exercise 9.38 on the use of broadband to access the Internet. In that exercise, the percents were read from a graph, and we assumed that the sample size was 2250 for all the surveys. Investigate the robustness of your conclusions in Exercise 9.38 against the use of 2250 as the sample size for all surveys and to roundoff and slight errors in reading the graph. Assume that the actual sample sizes ranged from 2200 to 2600. Assume also that the percents reported are all accurate to within ±2%. In other words, if the reported percent is 33%, then we can assume that the actual survey percent is between 31% and 35%. Reanalyze the data using at least five scenarios that vary the percents and the sample sizes within the assumed ranges. Summarize your results in a report, paying particular attention to the consequences for your conclusions in Exercise 9.38.

Question 9.41

9.41 Find the P-value.

For each of the following situations, give the degrees of freedom and an appropri ate bound on the P-value (give the exact value if you have software available) for the statistic for testing the null hypothesis of no association between the row and column variables.

  1. A table with .
  2. A table with .
  3. A table with .
  4. A table with .

9.41

(a) .
(b) .
(c) .
(d) .

Question 9.42

9.42 Health care fraud.

Most errors in billing insurance providers for health care services involve honest mistakes by patients, physicians, or others involved in the health care system. However, fraud is a serious problem. The National Health Care Anti-fraud Association estimates that approximately tens of billions of dollars are lost to health care fraud each year.18 When fraud is suspected, an audit of randomly selected billings is often conducted. The selected claims are then reviewed by experts, and each claim is classified as allowed or not allowed. The distributions of the amounts of claims are frequently highly skewed, with a large number of small claims and small number of large claims. Simple random sampling would likely be overwhelmed by small claims and would tend to miss the large claims, so stratification is often used. See the section on stratified sampling in Chapter 3 (page 134). Here are data from an audit that used three strata based on the sizes of the claims (small, medium, and large).19

berrors

Stratum Sampled claims Number not allowed
Small 57 6
Medium 17 5
Large 5 1
  1. Construct the table of counts for these data and include the marginal totals.
  2. Find the percent of claims that were not allowed in each of the three strata.
  3. State an appropriate null hypothesis to be tested for these data.
  4. Perform the significance test and report your test statistic with degrees of freedom and the P-value. State your conclusion.

Question 9.43

9.43 Population estimates.

Refer to the previous exercise. One reason to do an audit such as this is to estimate the number of claims that would not be allowed if all claims in a population were examined by experts. We have estimates of the proportions of such claims from each stratum based on our sample. With our simple random sampling of claims from each stratum, we have unbiased estimates of the corresponding population proportion for each stratum. Therefore, if we take the sample proportions and multiply by the population sizes, we would have the estimates that we need. Here are the population sizes for the three strata:

Stratum Claims in strata
Small 3342
Medium 246
Large 58
  1. For each stratum, estimate the total number of claims that would not be allowed if all claims in the stratum had been audited.
  2. (Optional) Give margins of error for your estimates. (Hint: You first need to find standard errors for your sample estimates; see Chapter 8, page 420.) Then you need to use the rules for variances given in Chapter 4 (page 226) to find the standard errors for the population estimates. Finally, you need to multiply by to determine the margins of error.

9.43

(a) The estimates are 352 for Small, 73 for Medium, and 12 for Large.

481

Question 9.44

9.44 Construct a table.

Construct a table of counts where there is no apparent association between the row and column variables.

Question 9.45

9.45 Jury selection.

Exercise 8.93 (page 453) concerns Casteneda v. Partida, the case in which the Supreme Court decision used the phrase “two or three standard deviations” as a criterion for statistical significance. There were 181,535 persons eligible for jury duty, of whom 143,611 were Mexican Americans. Of the 870 people selected for jury duty, 339 were Mexican Americans. We are interested in finding out if there is an association between being a Mexican American and being selected as a juror. Formulate this problem using a two-way table of counts. Construct the table using the variables “Mexican American or not” and “juror or not.” Find the statistic and its P-value. Square the statistic that you obtained in Exercise 8.93 and verify that the result is equal to the statistic.

9.45

with rounding error.

Question 9.46

9.46 Students explain statistical data.

The National Survey of Student Engagement conducts surveys to study various aspects of undergraduate education.20 In a recent survey, students were asked if they needed to explain the meaning of numerical or statistical data in a written assignment. Among the first-year students, 9,697 responded positively while 13,514 seniors responded positively. A total of 13,171 first-year students and 16,997 seniors from 622 U.S. four-year colleges and universities responded to the survey.

  1. Construct the two-way table of counts.
  2. State an appropriate null hypothesis that can be tested with these data.
  3. Perform the significance test and summarize the results. What do you conclude?
  4. The sample sizes here are very large, so even relatively small effects will be detected through a significance test. Do you think that the difference in percents is important and/or interesting? Explain your answer.

Question 9.47

9.47 A reduction in force.

In economic downturns or to improve their competitiveness, corporations may undertake a reduction in force (RIF), in which substantial numbers of employees are laid off. Federal and state laws require that employees be treated equally regardless of their age. In particular, employees over the age of 40 years are a “protected class.” Many allegations of discrimination focus on comparing employees over 40 with their younger coworkers. Here are the data for a recent RIF.

rif1

Over 40
Released No Yes
Yes 8 42
No 503 764
  1. Complete this two-way table by adding marginal and table totals. What percent of each employee age group (over 40 or not) were laid off? Does there appear to be a relationship between age and being laid off?
  2. Perform the chi-square test. Give the test statistic, the degrees of freedom, the P-value, and your conclusion.

9.47

(a) 5.21% of those over 40 and 1.57% of those under 40 were laid off. Yes, it looks like a higher percentage of over 40 were laid off than under 40.
(b) . The data provide evidence of an association between age group and being laid off.

Question 9.48

9.48 Employee performance appraisal.

A major issue that arises in RIFs like that in the previous exercise is the extent to which employees in various groups are similar. If, for example, employees over 40 receive generally lower performance ratings than younger workers, that might explain why more older employees were laid off. We have data on the last performance appraisal. The possible values are “partially meets expectations,” “fully meets expectations,” “usually exceeds expectations,” and “continually exceeds expectations.” Because there were very few employees who partially met expectations, we combine the first two categories. Here are the data.

rif2

Over 40
Performance appraisal No Yes
Partially or fully meets expectations 86 233
Usually exceeds expectations 352 493
Continually exceeds expectations 64 35

Note that the total number of employees in this table is less than the number in the previous exercise because some employees do not have a performance appraisal. Analyze the data. Do the older employees appear to have lower performance evaluations?

Question 9.49

9.49 Which model?

This exercise concerns the material in Section 9.1 on models for two-way tables. Look at Exercise 9.27, Exercise 9.31, Exercise 9.42, and Exercise 9.47. For each exercise, state whether you are comparing several populations based on separate samples from each population (the first model for two-way tables) or testing independence between two categorical variables based on a single sample (the second model).

9.49

9.27 and 9.47 are based on a single sample (the second model); 9.31 and 9.42 are based on separate samples (the first model).

482

Question 9.50

9.50 Computations for RDC and bank size.

Refer to the table of data for bank asset size and remote deposit capture offering in Exercise 9.27 (page 477).

  1. Compute the expected count for each cell in the table.
  2. Compute the test statistic.
  3. What are the degrees of freedom for this statistic?
  4. Sketch the appropriate distribution for this statistic and mark the values from Table F that bracket the computed value of the test statistic. What is the P-value that you would report if you did not use software and relied solely on Table F for your work?

Question 9.51

9.51 Titanic!

In 1912, the luxury liner Titanic, on its first voyage, struck an iceberg and sank. Some passengers got off the ship in lifeboats, but many died. Think of the Titanic disaster as an experiment in how the people of that time behaved when faced with death in a situation where only some can escape. The passengers are a sample from the population of their peers. Here is information about who lived and who died, by gender and economic status.21 (The data leave out a few passengers whose economic status is unknown.)

titanic

Men
Status Died Survived
Highest 111 61
Middle 150 22
Lowest 419 85
Total 680 168
Women
Status Died Survived
Highest 6 126
Middle 13 90
Lowest 107 101
Total 126 317
  1. Compare the percents of men and of women who died. Is there strong evidence that a higher proportion of men die in such situations? Why do you think this happened?
  2. Look only at the women. Describe how the three economic classes differ in the percent of women who died. Are these differences statistically significant?
  3. Now look only at the men and answer the same questions.

9.51

(a) 80.19% of men and 28.44% of women died. . The data provide evidence that a higher proportion of men died than women. Answers will vary for reasons.
(b) Among the women, 4.55% of Highest, 12.62% of Middle, and 51.44% of Lowest died. . The data provide evidence of an association between death and economic status for the women.
(c) Among the men, 64.53% of Highest, 87.21% of Middle, and 83.13% of Lowest died. . The data provide evidence of an association between death and economic status for the men.

Question 9.52

9.52 Goodness of fit to a standard Normal distribution.

Computer software generated 500 random numbers that should look as if they are from the standard Normal distribution. They are categorized into five groups: (1) less than or equal to −0.6, (2) greater than −0.6 and less than or equal to −0.1, (3) greater than −0.1 and less than or equal to 0.1, (4) greater than 0.1 and less than or equal to 0.6, and (5) greater than 0.6. The counts in the five groups are 139, 102, 41, 78, and 140, respectively. Find the probabilities for these five intervals using Table A. Then compute the expected number for each interval for a sample of 500. Finally, perform the goodness-of-fit test and summarize your results.

Question 9.53

9.53 More on the goodness of fit to a standard Normal distribution.

Refer to the previous exercise.

  1. Use software to generate your own sample of 800 standard Normal random variables, and perform the goodness-of-fit test using the intervals from the previous exercise.
  2. Choose a different set of intervals than the ones used in the previous exercise. Rerun the goodness-of-fit test.
  3. Compare the results you found in parts (a) and (b). Which intervals would you recommend?

Question 9.54

9.54 Goodness of fit to the uniform distribution.

Computer software generated 500 random numbers that should look as if they are from the uniform distribution on the interval 0 to 1 (see page 213). They are categorized into five groups: (1) less than or equal to 0.2, (2) greater than 0.2 and less than or equal to 0.4, (3) greater than 0.4 and less than or equal to 0.6, (4) greater than 0.6 and less than or equal to 0.8, and (5) greater than 0.8. The counts in the five groups are 114, 92, 108, 101, and 85, respectively. The probabilities for these five intervals are all the same. What is this probability? Compute the expected number for each interval for a sample of 500. Finally, perform the goodness-of-fit test and summarize your results.

Question 9.55

9.55 More on goodness of fit to the uniform distribution.

Refer to the previous exercise.

  1. Use software to generate your own sample of 800 uniform random variables on the interval from 0 to 1, and perform the goodness-of-fit test using the intervals from the previous exercise.
  2. Choose a different set of intervals than the ones used in the previous exercise. Rerun the goodness-of-fit test.
  3. Compare the results you found in parts (a) and (b). Which intervals would you recommend?

Question 9.56

9.56 Suspicious results?

An instructor who assigned an exercise similar to the one described in the previous exercise received homework from a student who reported a P-value of 0.999. The instructor suspected that the student did not use the computer for the assignment but just made up some numbers for the homework. Why was the instructor suspicious? How would this scenario change if there were 2000 students in the class?