10 Two-Sample Inference

10.3 Inference for Two Independent Proportions

This page includes Statistical Videos

OBJECTIVES By the end of this section, I will be able to …

Perform and interpret $Z$ tests for $p_{1} - p_{2}$ .
Compute and interpret $Z$ intervals for $p_{1} - p_{2}$ .
Use $Z$ intervals for $p_{1} - p_{2}$ to perform two-tailed $Z$ tests.

1 Independent Sample $Z$ Tests for $p_{1} - p_{2}$

So far in this chapter, we have learned how to perform inference about population means. In this section, we learn how to perform hypothesis tests and construct confidence intervals about the difference between two population proportions. Recall that the sample proportion of success $\hat{p} = x / n$ is the ratio of the number of successes $x$ to the number of trials $n$ in a binomial experiment.

Page 607

In this section, we consider two independent samples, each of which yields a sample proportion: ${\hat{p}}_{1} = x_{1} / n_{1}$ and ${\hat{p}}_{2} = x_{2} / n_{2}$ . For example, a recent survey found the sample proportion of males (sample 1) and females (sample 2) who agree that “technological changes will lead toward a future where people's lives are mostly better” to be

${\hat{p}}_{1} = \frac{x_{1}}{n_{1}} = \frac{335}{500} = 0.67$

and

${\hat{p}}_{2} = \frac{x_{2}}{n_{2}} = \frac{255}{500} = 0.51$

(See Example 15 for further details about these data.) Here, we are interested in performing inference for the difference in population proportions $p_{1} - p_{2}$ , such as the difference in the proportions of all males and females who think technological change will lead to a better future. We use the difference in sample proportions ${\hat{p}}_{1} - {\hat{p}}_{2}$ as our point estimate of the difference in population proportions $p_{1} - p_{2}$ , which is unknown. And just as in earlier sections where we investigated the sampling distribution of ${\bar{x}}_{1} - {\bar{x}}_{2}$ to perform inference on $μ_{1} - μ_{2}$ , here we use the sampling distribution of ${\hat{p}}_{1} - {\hat{p}}_{2}$ to help us perform inference about $p_{1} - p_{2}$ .

Developing Your Statistical Sense

Independent Samples Only

The inferential methods of this section are reserved for independent samples only. An example of a problem that would not use the methods of this section is the following: In the latest poll, suppose 45% of the respondents supported the Democratic candidate and 45% supported the Republican one. Because each respondent had to choose between the Democratic candidate and the Republican candidate, their respective poll numbers are not independent.

The distribution of all possible values of ${\hat{p}}_{1} - {\hat{p}}_{2}$ is called the sampling distribution of ${\hat{p}}_{1} - {\hat{p}}_{2}$ , with mean $p_{1} - p_{2}$ and standard error

$σ_{{\hat{p}}_{1} - {\hat{p}}_{2}} = \sqrt{\frac{p_{1} (1 - p_{1})}{n_{1}} + \frac{p_{2} (1 - p_{2})}{n_{2}}} .$

Let $x_{1}$ and $x_{2}$ denote the number of successes, and let $n_{1} - x_{1}$ and $n_{2} - x_{2}$ denote the number of failures in sample 1 and sample 2, respectively. The sampling distribution of ${\hat{p}}_{1} - {\hat{p}}_{2}$ is approximately normal when the number of successes and the number of failures in each sample are each at least 5, that is, when $x_{1} \geq 5$ , $(n_{1} - x_{1}) \geq 5$ , $x_{2} \geq 5$ , and $(n_{2} - x_{2}) \geq 5$ . Let $q_{1} = 1 - p_{1}, q_{2} = 1 - p_{2}, {\hat{q}}_{1} = 1 - {\hat{p}}_{1}$ and ${\hat{q}}_{2} = 1 - {\hat{p}}_{2}$ .

Sampling Distribution of ${\hat{p}}_{1} - {\hat{p}}_{2}$

When two random samples are drawn independently from two populations, then the quantity

$Z = \frac{({\hat{p}}_{1} - {\hat{p}}_{2}) - (p_{1} - p_{2})}{\sqrt{\frac{p_{1} q_{1}}{n_{1}} + \frac{p_{2} q_{2}}{n_{2}}}}$

has an approximately standard normal distribution when the following conditions are satisfied:

$\begin{matrix} x_{1} \geq 5, & (n_{1} - x_{1}) \geq 5, & x_{2} \geq 5, & (n_{2} - x_{2}) \geq 5 \end{matrix}$

and where ${\hat{p}}_{1}$ and $n_{1}$ represent the sample proportion and sample size of the sample taken from population 1 with population proportion $p_{1}; {\hat{p}}_{2}$ and $n_{2}$ represent the sample proportion and sample size of the sample taken from population 2 with population proportion $p_{2}$ ; and $q_{1} = 1 - p_{1}$ and $q_{2} = 1 - p_{2}$ .

Page 608

The three possible forms for the $Z$ test for $p_{1} - p_{2}$ are as follows:

$H_{0}$ : $p_{1} = p_{2}$	$H_{a} : p_{1}$ > $p_{2}$	Right-tailed test
$H_{0}$ : $p_{1} = p_{2}$	$H_{a} : p_{1}$ < $p_{2}$	Left-tailed test
$H_{0}$ : $p_{1} = p_{2}$	$H_{a} : p_{1}$ ≠ $p_{2}$	Two-tailed test

The null hypothesis asserts that $H_{0}$ : $p_{1} = p_{2}$ . We denote this common population proportion as $p$ . The null hypothesis is assumed true, so the test statistic takes the following form:

$\begin{matrix} Z_{data} = \frac{({\hat{p}}_{1} - {\hat{p}}_{2}) - (p_{1} - p_{2})}{\sqrt{\frac{p_{1} (1 - p_{1})}{n_{1}} + \frac{p_{2} (1 - p_{2})}{n_{2}}}} = \frac{({\hat{p}}_{1} - {\hat{p}}_{2}) - 0}{\sqrt{\frac{p_{1} (1 - p_{1})}{n_{1}} + \frac{p_{2} (1 - p_{2})}{n_{2}}}} \\ = \frac{({\hat{p}}_{1} - {\hat{p}}_{2})}{\sqrt{\frac{p (1 - p)}{n_{1}} + \frac{p (1 - p)}{n_{2}}}} = \frac{({\hat{p}}_{1} - {\hat{p}}_{2})}{\sqrt{p (1 - p) (\frac{1}{n_{1}} + \frac{1}{n_{2}})}} \end{matrix}$

The common population proportion $p$ is unknown, so we estimate it using the following pooled estimate of $p$ :

${\hat{p}}_{pooled} \frac{x_{1} + x_{2}}{n_{1} + n_{2}}$

Note: As a check on your arithmetic, ${\hat{p}}_{pooled}$ must also lie between ${\hat{p}}_{1}$ and ${\hat{p}}_{2}$ .

Substituting this into the formula for the test statistic gives

$Z_{data} = \frac{({\hat{p}}_{1} - {\hat{p}}_{2})}{\sqrt{{\hat{p}}_{pooled} \cdot (1 - {\hat{p}}_{pooled}) (\frac{1}{n_{1}} + \frac{1}{n_{2}})}}$

$Z_{data}$ measures the distance between the sample proportions. Extreme values of $Z_{data}$ indicate evidence against the null hypothesis.

Hypothesis Test for the Difference in Two Population Proportions: Critical-Value Method

Suppose we have two independent random samples taken from two populations with population proportions $p_{1}$ and $p_{2}$ , and the required conditions are met: $x_{1} \geq 5$ , $(n_{1} - x_{1}) \geq 5$ , $x_{2} \geq 5$ , and ( $n_{2} - x_{2}$ ) ≥ 5.

Step 1 State the hypotheses.

Use one of the forms from Table 12 (page 609). State the meaning of $p_{1}$ and $p_{2}$ .
Step 2 Find $Z_{crit}$ and state the rejection rule.

Use Table 12 on page 609.
Step 3 Calculate $Z_{data}$

$Z_{data} = \frac{{\hat{p}}_{1} - {\hat{p}}_{2}}{\sqrt{{\hat{p}}_{pooled} \cdot (1 - {\hat{p}}_{pooled}) (\frac{1}{n_{1}} + \frac{1}{n_{2}})}}$

where

${\hat{p}}_{pooled} = \frac{x_{1} + x_{2}}{n_{1} + n_{2}}$

$Z_{data}$ follows an approximately standard normal distribution if the required conditions are satisfied.
Step 4 State the conclusion and the interpretation.

Compare $Z_{data}$ with $Z_{crit}$ .

Page 609

Table 10.48: Table 12 Critical regions and rejection rules for

$Z$ test for

$p_{1} - p_{2}$

EXAMPLE 15 $Z$ test for $p_{1} - p_{2}$ using the critical-value method

In April 2014, the Pew Research Center published a report called U.S. Views of Technology and the Future,¹⁵ in which the results of a survey of Americans' views on the future of technology were examined. Among other questions, respondents were asked whether they agreed that “technological changes will lead toward a future where people's lives are mostly better.” The results are shown in Table 13. Assume the samples are independent.

Table 10.49: Table 13 Proportions of males and females who agree that technological change will lead to a better future

	Males	Females
Number agreeing	$x_{1} = 335$	$x_{2} = 255$
Sample size	$n_{1} = 500$	$n_{2} = 500$
Sample proportion	$\begin{matrix} {\hat{p}}_{1} & = & x_{1} / n_{1} \\ = & 335 / 500 \\ = & 0.67 \end{matrix}$	$\begin{matrix} {\hat{p}}_{2} & = & x_{2} / n_{2} \\ = & 255 / 500 \\ = & 0.51 \end{matrix}$
Population proportion	$p_{1} = ?$	$p_{2} = ?$

Find the point estimate of the difference in the population proportions of males and females, ${\hat{p}}_{1} - {\hat{p}}_{2}$ .
Compute the pooled estimate of the common proportion, ${\hat{p}}_{pooled}$ .
Calculate the value of the test statistic $Z_{data}$ .
Check whether the conditions for performing the $Z$ test for $p_{1} - p_{2}$ are met.
Test whether the population proportion of males who agree that technology will lead to a better future is greater than the population proportion of females who agree. Use the critical-value method at level of significance $α = 0.01$ .

Page 610

Solution

The point estimate is ${\hat{p}}_{1} - {\hat{p}}_{2} = 0.67 - 0.51 = 0.16$
${\hat{p}}_{pooled} = \frac{x_{1} + x_{2}}{n_{1} + n_{2}} = \frac{335 + 255}{500 + 500} = 0.59$

FIGURE 16 TI-83/84 results.
$Z_{data} = \frac{{\hat{p}}_{1} - {\hat{p}}_{2}}{\sqrt{{\hat{p}}_{pooled} \cdot (1 - {\hat{p}}_{pooled}) (\frac{1}{n_{1}} + \frac{1}{n_{2}})}} = \frac{0.67 - 0.51}{\sqrt{(0.59) (0.41) (\frac{1}{500} + \frac{1}{500})}} \approx 5.1$
We check the conditions for performing the $Z$ test for $p_{1} - p_{2}$ . We have: $x_{1} = 335 \geq 5$ , $x_{2} = 255 \geq 5$ , $n_{1} - x_{1} = 500 - 335 = 165 \geq 5$ , and $n_{2} - x_{2} = 500 - 255 = 245 \geq 5$ . We may thus proceed with the hypothesis test.
The $Z$ test for $p_{1} \leq p_{2}$ follows the steps below.

Step 1 State the hypotheses.

The key words “greater than,” together with the fact that sample 1 represents the males, indicate that we have a right-tailed test:

$\begin{matrix} H_{0} : p_{1} = p_{2} & versus & H_{a} : p_{1} > p_{2} \end{matrix}$

where $p_{1}$ and $p_{2}$ represent the population proportion of males and females, respectively, who agree that technology will lead to a better future.

FIGURE 17 $Z_{data} = 5.1$ is extreme, leading to rejection of $H_{0}$ .
Step 2 Find $Z_{crit}$ and state the rejection rule.

For a right-tailed test with level of significance $α = 0.01$ , Table 12 gives us $Z_{crit} = 2.33$ and our rejection rule: Reject $H_{0}$ if $Z_{data} \geq 2.33$ .
Step 3 Calculate $Z_{data}$ .

From (c), we have $Z_{data} \approx 5.1$ (also see Figure 16).
Step 4 State the conclusion and the interpretation.

$Z_{data} \approx 5.1 \geq 2.33$ ; therefore, reject $H_{0}$ (see Figure 17). There is evidence at level of significance $α = 0.01$ that the population proportion of males who agree that technology will lead to a better future is greater than the population proportion of females who agree.

NOW YOU CAN DO

Exercises 5–8.

We may also use the $p$ -value method to perform the $Z$ test for $p_{1} - p_{2}$ .

Hypothesis Test for the Difference in Two Population Proportions: $p$ -value Method

Suppose we have two independent random samples taken from two populations with population proportions $p_{1}$ and $p_{2}$ , and the required conditions are met: $x_{1} \geq 5$ , ( $n_{1} - x_{1}$ ) ≥ 5, $x_{2} \geq 5$ , and ( $n_{2} - x_{2}$ ) ≥ 5.

Step 1 State the hypotheses and the rejection rule.
Use one of the forms from Table 12. State the meaning of $p_{1}$ and $p_{2}$ . The rejection rule is: Reject $H_{0}$ if the $p$ -value $\leq α$ .
Step 2 calculate $Z_{data}$ .

$Z_{data} = \frac{{\hat{p}}_{1} - {\hat{p}}_{2}}{\sqrt{{\hat{p}}_{pooled} \cdot (1 - {\hat{p}}_{pooled}) (\frac{1}{n_{1}} + \frac{1}{n_{2}})}}$

where ${\hat{p}}_{pooled} = \frac{x_{1} + x_{2}}{n_{1} + n_{2}}$ . If the required conditions are satisfied, $Z_{data}$ follows an approximately standard normal distribution.
Step 3 Find the $p$ -value.

Either use technology or calculate the $p$ -value using one of the forms in Table 14.
Step 4 State the conclusion and the interpretation.

Compare the $p$ -value with $α$ .

Page 611

Table 10.50: Table 14

$p$ -Values for

$Z$ test for

$p_{1} - p_{2}$

EXAMPLE 16 $Z$ test for $p_{1} - p_{2}$ using the $p$ -value method

The General Social Survey tracks trends in American society through annual surveys. Married respondents were asked to characterize their feelings about being married. The results are shown here in a crosstabulation with gender. Test the hypothesis that the proportion of females who report being very happily married is smaller than the proportion of males who report being very happily married. Use the $p$ -value method with level of significance $α = 0.05$ .

marriage

	Very happy	Pretty happy/ Not too happy	Total
Female	257	166	423
Male	242	124	366
Total	499	290	789

Solution

From the crosstabulation, we assemble the statistics in Table 15 for the independent random samples of men and women.

Table 10.52: Table 15 Sample statistics of very happily married respondents

	Sample size	Number very happy	Sample proportion very happy
Females (sample 1)	$n_{1} = 423$	$x_{1} = 257$	${\hat{p}}_{1} = \frac{x_{1}}{n_{1}} = \frac{257}{423} \approx 0.6076$
Males (sample 2)	$n_{2} = 366$	$x_{2} = 242$	${\hat{p}}_{2} = \frac{x_{2}}{n_{2}} = \frac{257}{423} \approx 0.6612$

We first check whether the conditions for the $Z$ test are valid: $x_{1} = 257 \geq 5$ , $(n_{1} - x_{1}) = (423 - 257) = 166 \geq 5$ , $x_{2} = 242 \geq 5$ , and $(n_{2} - x_{2}) = (366 - 242) = 124 \geq 5$ . We can therefore proceed.

Step 1 State the hypotheses and the rejection rule.

We are interested in whether the proportion of females who report being very happily married is smaller than that of males and because the females represent sample 1, the hypotheses are

$\begin{matrix} H_{0} : p_{1} = p_{2} & H_{a} : p_{1} < p_{2} \end{matrix}$

Page 612

where $p_{1}$ and $p_{2}$ represent the population proportions of all females and males, respectively, who report being very happily married. We will reject $H_{0}$ if the $p - value \leq α = 0.05$ .

FIGURE 18 $p$ -Value for left-tailed $Z$ test.
Step 2 Find $Z_{data}$ .

First, use the data from Table 15 to find the value of ${\hat{p}}_{pooled}$ .

${\hat{p}}_{pooled} = \frac{x_{1} + x_{2}}{n_{1} + n_{2}} = \frac{257 + 242}{423 + 366} \approx 0.63245$

Then

$Z_{data} = \frac{(0.6076 - 0.6612)}{\sqrt{0.63245} \cdot (1 - 0.63245) (\frac{1}{423} + \frac{1}{366})} \approx - 1.56$
Step 3 Find the $p$ -value.

Because it is a left-tailed test, the $p$ -value is given by Table 14 as $P (Z < Z_{data}) = P (Z < - 1.56)$ , as shown in Figure 18. This amounts to a Case 1 problem from Table 8 in Chapter 6 on page 357:

$P (Z < - 1.56) = 0.0594$
Step 4 State the conclusion and the interpretation.

The $p -value = 0.0594$ is not less than or equal to $α = 0.05$ , so we do not reject $H_{0}$ . There is insufficient evidence that the proportion of females who report being very happily married is smaller than the proportion of males who do so.

Note: When the $p$ -value is close to $α$ , many data analysts prefer to simply assess the strength of evidence against the null hypothesis using criteria such as those given in Table 6 in Chapter 9 (page 514).

NOW YOU CAN DO

Exercises 9–12.

2 Independent Sample $Z$ Interval for $p_{1} - p_{2}$

We have learned how to perform $Z$ tests for $p_{1} - p_{2}$ . Next, we learn how to use sample statistics to estimate $p_{1} - p_{2}$ using a confidence interval.

Confidence Interval for $p_{1} - p_{2}$

For two independent random samples taken from two populations with population proportions $p_{1}$ and $p_{2}$ , a $100 (1 - α) %$ confidence interval for $p_{1} - p_{2}$ is given by

${\hat{p}}_{1} - {\hat{p}}_{2} \pm Z_{α / 2} \sqrt{\frac{{\hat{p}}_{1} \cdot {\hat{q}}_{1}}{n_{1}} + \frac{{\hat{p}}_{2} \cdot {\hat{q}}_{2}}{n_{2}}}$

where ${\hat{p}}_{1}$ and $n_{1}$ represent the sample proportion and sample size of the sample taken from population 1 with population proportion $p_{1}$ ; ${\hat{p}}_{2}$ and $n_{2}$ represent the sample proportion and sample size of the sample taken from population 2 with population proportion $p_{2}$ ; ${\hat{q}}_{1} = 1 - {\hat{p}}_{1}$ and ${\hat{q}}_{2} = 1 - {\hat{p}}_{2}$ , and the samples are drawn independently; and the following conditions are satisfied: $x_{1} \geq 5$ , $(n_{1} - x_{1}) \geq 5$ , $x_{2} \geq 5$ , and $(n_{2} - x_{2}) \geq 5$ .

Margin of Error $E$

The margin of error for a $100 (1 - α) %$ confidence interval for $p_{1} - p_{2}$ is given by

$E = Z_{α / 2} \cdot \sqrt{\frac{{\hat{p}}_{1} \cdot {\hat{q}}_{1}}{n_{1}} + \frac{{\hat{p}}_{2} \cdot {\hat{q}}_{2}}{n_{2}}}$

EXAMPLE 17 $Z$ confidence interval for $p_{1} - p_{2}$

Use the sample statistics from Example 15 to do the following:

Calculate and interpret the margin of error $E$ for confidence level 99%.
Construct and interpret a 99% confidence interval for $p_{1} - p_{2}$ .

Page 613

Solution

The conditions for the confidence interval are the same as for the hypothesis test and were checked in Example 15.

$\begin{matrix} {\hat{q}}_{1} = 1 - {\hat{p}}_{1} = 1 - 0.67 = 0.33 & {\hat{q}}_{2} = 1 - {\hat{p}}_{2} = 1 - 0.51 = 0.49. \end{matrix}$

From Table 1 in Chapter 8 on page 432, the $Z_{α / 2}$ value for a 99% confidence level is 2.576. Therefore, the margin of error is

$E = Z_{α / 2} \cdot \sqrt{\frac{{\hat{p}}_{1} \cdot {\hat{q}}_{1}}{n_{1}} + \frac{{\hat{p}}_{2} \cdot {\hat{q}}_{2}}{n_{1}}} = (2.576) \sqrt{\frac{(0.67) (0.33)}{500} + \frac{(0.51) (0.49)}{500}} \approx 0.079$

The margin of error is 0.079, so we may estimate $p_{1} - p_{2}$ to within 0.079 with 99% confidence.
The point estimate is ${\hat{p}}_{1} - {\hat{p}}_{2} = 0.67 - 0.51 = 0.16$ . The 99% confidence interval is therefore

${\hat{p}}_{1} - {\hat{p}}_{2} \pm E = 0.16 \pm 0.079 = (0.081, 0.239)$

We are 99% confident that the difference in population proportions of males and females who agree that technology will lead to a better future lies between 0.081 and 0.239.

NOW YOU CAN DO

Exercises 13–18.

3 Use $Z$ Confidence Intervals to Perform $Z$ Tests for $p_{1} - p_{2}$

Given a $100 (1 - α) %$ $Z$ confidence interval for $p_{1} - p_{2}$ , we may perform two-tailed $Z$ tests for various hypothesized values of $p_{1} - p_{2}$ . If a proposed value lies outside the $100 (1 - α) %$ $Z$ confidence interval for $p_{1} - p_{2}$ , then the null hypothesis specifying this value would be rejected. Otherwise, do not reject the null hypothesis.

EXAMPLE 18 Using a $Z$ interval for $p_{1} - p_{2}$ to perform $Z$ tests about $p_{1} - p_{2}$

This example asks whether $p_{1} - p_{2}$ differs from (or is not equal to) a certain value, so we can use the $Z$ confidence interval to test the hypotheses. Example 17 provided a 99% $Z$ confidence interval for $p_{1} - p_{2}$ , the difference in population proportions of males and females who agree that technology will lead to a better future, as (0.081, 0.239). Test, using level of significance $α = 0.01$ , whether the $p_{1} - p_{2}$ differs from these values: (a) 0.1, (b) 0.2, (c) 0.3.

Solution

$H_{0} : p_{1} - p_{2} = 0.1$ versus $H_{a} : p_{1} - p_{2} \neq 0.1$ .

The hypothesized value 0.1 lies outside the interval (0.081, 0.239), so we reject $H_{0}$ .
$H_{0} : p_{1} - p_{2} = 0.2$ versus $H_{a} : p_{1} - p_{2} \neq 0.2$ .

The hypothesized value 0.2 lies inside the interval, so we do not reject $H_{0}$ .
$H_{0} : p_{1} - p_{2} = 0.3$ versus $H_{a} : p_{1} - p_{2} \neq 0.3$ .

The hypothesized value 0.3 lies outside the interval, so we reject $H_{0}$ .

NOW YOU CAN DO

Exercises 19–22.