OBJECTIVES By the end of this section, I will be able to …
1 Independent Sample Tests for
So far in this chapter, we have learned how to perform inference about population means. In this section, we learn how to perform hypothesis tests and construct confidence intervals about the difference between two population proportions. Recall that the sample proportion of success is the ratio of the number of successes to the number of trials in a binomial experiment.
607
In this section, we consider two independent samples, each of which yields a sample proportion: and . For example, a recent survey found the sample proportion of males (sample 1) and females (sample 2) who agree that “technological changes will lead toward a future where people's lives are mostly better” to be
and
(See Example 15 for further details about these data.) Here, we are interested in performing inference for the difference in population proportions , such as the difference in the proportions of all males and females who think technological change will lead to a better future. We use the difference in sample proportions as our point estimate of the difference in population proportions , which is unknown. And just as in earlier sections where we investigated the sampling distribution of to perform inference on , here we use the sampling distribution of to help us perform inference about .
Developing Your Statistical Sense
Independent Samples Only
The inferential methods of this section are reserved for independent samples only. An example of a problem that would not use the methods of this section is the following: In the latest poll, suppose 45% of the respondents supported the Democratic candidate and 45% supported the Republican one. Because each respondent had to choose between the Democratic candidate and the Republican candidate, their respective poll numbers are not independent.
The distribution of all possible values of is called the sampling distribution of , with mean and standard error
Let and denote the number of successes, and let and denote the number of failures in sample 1 and sample 2, respectively. The sampling distribution of is approximately normal when the number of successes and the number of failures in each sample are each at least 5, that is, when , , , and . Let and .
Sampling Distribution of
When two random samples are drawn independently from two populations, then the quantity
has an approximately standard normal distribution when the following conditions are satisfied:
and where and represent the sample proportion and sample size of the sample taken from population 1 with population proportion and represent the sample proportion and sample size of the sample taken from population 2 with population proportion ; and and .
608
The three possible forms for the test for are as follows:
: | > | Right-tailed test |
: | < | Left-tailed test |
: | ≠ | Two-tailed test |
The null hypothesis asserts that : . We denote this common population proportion as . The null hypothesis is assumed true, so the test statistic takes the following form:
The common population proportion is unknown, so we estimate it using the following pooled estimate of :
Note: As a check on your arithmetic, must also lie between and .
Substituting this into the formula for the test statistic gives
measures the distance between the sample proportions. Extreme values of indicate evidence against the null hypothesis.
Hypothesis Test for the Difference in Two Population Proportions: Critical-Value Method
Suppose we have two independent random samples taken from two populations with population proportions and , and the required conditions are met: , , , and () ≥ 5.
Step 1 State the hypotheses.
Use one of the forms from Table 12 (page 609). State the meaning of and .
Step 2 Find and state the rejection rule.
Use Table 12 on page 609.
Step 3 Calculate
where
follows an approximately standard normal distribution if the required conditions are satisfied.
Step 4 State the conclusion and the interpretation.
Compare with .
609
EXAMPLE 15 test for using the critical-value method
In April 2014, the Pew Research Center published a report called U.S. Views of Technology and the Future,15 in which the results of a survey of Americans' views on the future of technology were examined. Among other questions, respondents were asked whether they agreed that “technological changes will lead toward a future where people's lives are mostly better.” The results are shown in Table 13. Assume the samples are independent.
Males | Females | |
---|---|---|
Number agreeing | ||
Sample size | ||
Sample proportion | ||
Population proportion |
610
Solution
Step 1 State the hypotheses.
The key words “greater than,” together with the fact that sample 1 represents the males, indicate that we have a right-tailed test:
where and represent the population proportion of males and females, respectively, who agree that technology will lead to a better future.
Step 2 Find and state the rejection rule.
For a right-tailed test with level of significance , Table 12 gives us and our rejection rule: Reject if .
Step 3 Calculate .
From (c), we have (also see Figure 16).
Step 4 State the conclusion and the interpretation.
; therefore, reject (see Figure 17). There is evidence at level of significance that the population proportion of males who agree that technology will lead to a better future is greater than the population proportion of females who agree.
NOW YOU CAN DO
Exercises 5–8.
We may also use the -value method to perform the test for .
Hypothesis Test for the Difference in Two Population Proportions: -value Method
Suppose we have two independent random samples taken from two populations with population proportions and , and the required conditions are met: , () ≥ 5, , and () ≥ 5.
Use one of the forms from Table 12. State the meaning of and . The rejection rule is: Reject if the -value .
Step 2 calculate .
where . If the required conditions are satisfied, follows an approximately standard normal distribution.
Step 3 Find the -value.
Either use technology or calculate the -value using one of the forms in Table 14.
Step 4 State the conclusion and the interpretation.
Compare the -value with .
611
EXAMPLE 16 test for using the -value method
The General Social Survey tracks trends in American society through annual surveys. Married respondents were asked to characterize their feelings about being married. The results are shown here in a crosstabulation with gender. Test the hypothesis that the proportion of females who report being very happily married is smaller than the proportion of males who report being very happily married. Use the -value method with level of significance .
marriage
Very happy | Pretty happy/ Not too happy |
Total | |
---|---|---|---|
Female | 257 | 166 | 423 |
Male | 242 | 124 | 366 |
Total | 499 | 290 | 789 |
Solution
From the crosstabulation, we assemble the statistics in Table 15 for the independent random samples of men and women.
Sample size | Number very happy |
Sample proportion very happy | |
---|---|---|---|
Females (sample 1) | |||
Males (sample 2) |
We first check whether the conditions for the test are valid: , , , and . We can therefore proceed.
Step 1 State the hypotheses and the rejection rule.
We are interested in whether the proportion of females who report being very happily married is smaller than that of males and because the females represent sample 1, the hypotheses are
612
where and represent the population proportions of all females and males, respectively, who report being very happily married. We will reject if the .
Step 2 Find .
First, use the data from Table 15 to find the value of .
Then
Step 3 Find the -value.
Because it is a left-tailed test, the -value is given by Table 14 as , as shown in Figure 18. This amounts to a Case 1 problem from Table 8 in Chapter 6 on page 357:
Step 4 State the conclusion and the interpretation.
The is not less than or equal to , so we do not reject . There is insufficient evidence that the proportion of females who report being very happily married is smaller than the proportion of males who do so.
Note: When the -value is close to , many data analysts prefer to simply assess the strength of evidence against the null hypothesis using criteria such as those given in Table 6 in Chapter 9 (page 514).
NOW YOU CAN DO
Exercises 9–12.
2 Independent Sample Interval for
We have learned how to perform tests for . Next, we learn how to use sample statistics to estimate using a confidence interval.
Confidence Interval for
For two independent random samples taken from two populations with population proportions and , a confidence interval for is given by
where and represent the sample proportion and sample size of the sample taken from population 1 with population proportion ; and represent the sample proportion and sample size of the sample taken from population 2 with population proportion ; and , and the samples are drawn independently; and the following conditions are satisfied: , , , and .
Margin of Error
The margin of error for a confidence interval for is given by
EXAMPLE 17 confidence interval for
Use the sample statistics from Example 15 to do the following:
613
Solution
The conditions for the confidence interval are the same as for the hypothesis test and were checked in Example 15.
From Table 1 in Chapter 8 on page 432, the value for a 99% confidence level is 2.576. Therefore, the margin of error is
The margin of error is 0.079, so we may estimate to within 0.079 with 99% confidence.
The point estimate is . The 99% confidence interval is therefore
We are 99% confident that the difference in population proportions of males and females who agree that technology will lead to a better future lies between 0.081 and 0.239.
NOW YOU CAN DO
Exercises 13–18.
3 Use Confidence Intervals to Perform Tests for
Given a confidence interval for , we may perform two-tailed tests for various hypothesized values of . If a proposed value lies outside the confidence interval for , then the null hypothesis specifying this value would be rejected. Otherwise, do not reject the null hypothesis.
EXAMPLE 18 Using a interval for to perform tests about
This example asks whether differs from (or is not equal to) a certain value, so we can use the confidence interval to test the hypotheses. Example 17 provided a 99% confidence interval for , the difference in population proportions of males and females who agree that technology will lead to a better future, as (0.081, 0.239). Test, using level of significance , whether the differs from these values: (a) 0.1, (b) 0.2, (c) 0.3.
Solution
versus .
The hypothesized value 0.1 lies outside the interval (0.081, 0.239), so we reject .
versus .
The hypothesized value 0.2 lies inside the interval, so we do not reject .
versus .
The hypothesized value 0.3 lies outside the interval, so we reject .
NOW YOU CAN DO
Exercises 19–22.