Loading [MathJax]/jax/output/CommonHTML/jax.js

10.3 Inference for Two Independent Proportions

This page includes Statistical Videos

OBJECTIVES By the end of this section, I will be able to …

  1. Perform and interpret Z tests for p1p2.
  2. Compute and interpret Z intervals for p1p2.
  3. Use Z intervals for p1p2 to perform two-tailed Z tests.

1 Independent Sample Z Tests for p1p2

So far in this chapter, we have learned how to perform inference about population means. In this section, we learn how to perform hypothesis tests and construct confidence intervals about the difference between two population proportions. Recall that the sample proportion of success ˆp=x/n is the ratio of the number of successes x to the number of trials n in a binomial experiment.

Page 607

In this section, we consider two independent samples, each of which yields a sample proportion: ˆp1=x1/n1 and ˆp2=x2/n2. For example, a recent survey found the sample proportion of males (sample 1) and females (sample 2) who agree that “technological changes will lead toward a future where people's lives are mostly better” to be

ˆp1=x1n1=335500=0.67

and

ˆp2=x2n2=255500=0.51

(See Example 15 for further details about these data.) Here, we are interested in performing inference for the difference in population proportions p1p2, such as the difference in the proportions of all males and females who think technological change will lead to a better future. We use the difference in sample proportions ˆp1ˆp2 as our point estimate of the difference in population proportions p1p2, which is unknown. And just as in earlier sections where we investigated the sampling distribution of ˉx1ˉx2 to perform inference on μ1μ2, here we use the sampling distribution of ˆp1ˆp2 to help us perform inference about p1p2.

Developing Your Statistical Sense

Independent Samples Only

The inferential methods of this section are reserved for independent samples only. An example of a problem that would not use the methods of this section is the following: In the latest poll, suppose 45% of the respondents supported the Democratic candidate and 45% supported the Republican one. Because each respondent had to choose between the Democratic candidate and the Republican candidate, their respective poll numbers are not independent.

The distribution of all possible values of ˆp1ˆp2 is called the sampling distribution of ˆp1ˆp2, with mean p1p2 and standard error

σˆp1ˆp2=p1(1p1)n1+p2(1p2)n2.

Let x1 and x2 denote the number of successes, and let n1x1 and n2x2 denote the number of failures in sample 1 and sample 2, respectively. The sampling distribution of ˆp1ˆp2 is approximately normal when the number of successes and the number of failures in each sample are each at least 5, that is, when x15, (n1x1)5, x25, and (n2x2)5. Let q1=1p1,q2=1p2,ˆq1=1ˆp1 and ˆq2=1ˆp2.

Sampling Distribution of ˆp1ˆp2

When two random samples are drawn independently from two populations, then the quantity

Z=(ˆp1ˆp2)(p1p2)p1q1n1+p2q2n2

has an approximately standard normal distribution when the following conditions are satisfied:

x15,(n1x1)5,x25,(n2x2)5

and where ˆp1 and n1 represent the sample proportion and sample size of the sample taken from population 1 with population proportion p1;ˆp2 and n2 represent the sample proportion and sample size of the sample taken from population 2 with population proportion p2; and q1=1p1 and q2=1p2.

Page 608

The three possible forms for the Z test for p1p2 are as follows:

H0 : p1=p2 Ha:p1 > p2 Right-tailed test
H0 : p1=p2 Ha:p1 < p2 Left-tailed test
H0 : p1=p2 Ha:p1p2 Two-tailed test

The null hypothesis asserts that H0 : p1=p2. We denote this common population proportion as p. The null hypothesis is assumed true, so the test statistic takes the following form:

Zdata=(ˆp1ˆp2)(p1p2)p1(1p1)n1+p2(1p2)n2=(ˆp1ˆp2)0p1(1p1)n1+p2(1p2)n2=(ˆp1ˆp2)p(1p)n1+p(1p)n2=(ˆp1ˆp2)p(1p)(1n1+1n2)

The common population proportion p is unknown, so we estimate it using the following pooled estimate of p:

ˆppooledx1+x2n1+n2

Note: As a check on your arithmetic, ˆppooled must also lie between ˆp1 and ˆp2.

Substituting this into the formula for the test statistic gives

Zdata=(ˆp1ˆp2)ˆppooled(1ˆppooled)(1n1+1n2)

Zdata measures the distance between the sample proportions. Extreme values of Zdata indicate evidence against the null hypothesis.

Hypothesis Test for the Difference in Two Population Proportions: Critical-Value Method

Suppose we have two independent random samples taken from two populations with population proportions p1 and p2, and the required conditions are met: x15, (n1x1)5, x25, and (n2x2) ≥ 5.

  • Step 1 State the hypotheses.

    Use one of the forms from Table 12 (page 609). State the meaning of p1 and p2.

  • Step 2 Find Zcrit and state the rejection rule.

    Use Table 12 on page 609.

  • Step 3 Calculate Zdata

    Zdata=ˆp1ˆp2ˆppooled(1ˆppooled)(1n1+1n2)

    where

    ˆppooled=x1+x2n1+n2

    Zdata follows an approximately standard normal distribution if the required conditions are satisfied.

  • Step 4 State the conclusion and the interpretation.

    Compare Zdata with Zcrit.

Page 609
Table 10.48: Table 12 Critical regions and rejection rules for Z test for p1p2
image

EXAMPLE 15 Z test for p1p2 using the critical-value method

image

In April 2014, the Pew Research Center published a report called U.S. Views of Technology and the Future,15 in which the results of a survey of Americans' views on the future of technology were examined. Among other questions, respondents were asked whether they agreed that “technological changes will lead toward a future where people's lives are mostly better.” The results are shown in Table 13. Assume the samples are independent.

Table 10.49: Table 13 Proportions of males and females who agree that technological change will lead to a better future
Males Females
Number agreeing x1=335 x2=255
Sample size n1=500 n2=500
Sample proportion ˆp1=x1/n1=335/500=0.67 ˆp2=x2/n2=255/500=0.51
Population proportion p1=? p2=?
  1. Find the point estimate of the difference in the population proportions of males and females, ˆp1ˆp2.
  2. Compute the pooled estimate of the common proportion, ˆppooled.
  3. Calculate the value of the test statistic Zdata.
  4. Check whether the conditions for performing the Z test for p1p2 are met.
  5. Test whether the population proportion of males who agree that technology will lead to a better future is greater than the population proportion of females who agree. Use the critical-value method at level of significance α=0.01.
Page 610

Solution

  1. The point estimate is ˆp1ˆp2=0.670.51=0.16
  2. ˆppooled=x1+x2n1+n2=335+255500+500=0.59
    image
    FIGURE 16 TI-83/84 results.
  3. Zdata=ˆp1ˆp2ˆppooled(1ˆppooled)(1n1+1n2)=0.670.51(0.59)(0.41)(1500+1500)5.1
  4. We check the conditions for performing the Z test for p1p2. We have: x1=3355, x2=2555, n1x1=500335=1655, and n2x2=500255=2455. We may thus proceed with the hypothesis test.
  5. The Z test for p1p2 follows the steps below.
  • Step 1 State the hypotheses.

    The key words “greater than,” together with the fact that sample 1 represents the males, indicate that we have a right-tailed test:

    H0:p1=p2versusHa:p1>p2

    where p1 and p2 represent the population proportion of males and females, respectively, who agree that technology will lead to a better future.

    image
    FIGURE 17 Zdata=5.1 is extreme, leading to rejection of H0.
  • Step 2 Find Zcrit and state the rejection rule.

    For a right-tailed test with level of significance α=0.01, Table 12 gives us Zcrit=2.33 and our rejection rule: Reject H0 if Zdata2.33.

  • Step 3 Calculate Zdata.

    From (c), we have Zdata5.1 (also see Figure 16).

  • Step 4 State the conclusion and the interpretation.

    Zdata5.12.33; therefore, reject H0 (see Figure 17). There is evidence at level of significance α=0.01 that the population proportion of males who agree that technology will lead to a better future is greater than the population proportion of females who agree.

NOW YOU CAN DO

Exercises 5–8.

We may also use the p-value method to perform the Z test for p1p2.

Hypothesis Test for the Difference in Two Population Proportions: p-value Method

Suppose we have two independent random samples taken from two populations with population proportions p1 and p2, and the required conditions are met: x15, (n1x1) ≥ 5, x25, and (n2x2) ≥ 5.

  • Step 1 State the hypotheses and the rejection rule.

    Use one of the forms from Table 12. State the meaning of p1 and p2. The rejection rule is: Reject H0 if the p-value α.

  • Step 2 calculate Zdata.

    Zdata=ˆp1ˆp2ˆppooled(1ˆppooled)(1n1+1n2)

    where ˆppooled=x1+x2n1+n2. If the required conditions are satisfied, Zdata follows an approximately standard normal distribution.

  • Step 3 Find the p-value.

    Either use technology or calculate the p-value using one of the forms in Table 14.

  • Step 4 State the conclusion and the interpretation.

    Compare the p-value with α.

Page 611
Table 10.50: Table 14 p-Values for Z test for p1p2
image

EXAMPLE 16 Z test for p1p2 using the p-value method

image

The General Social Survey tracks trends in American society through annual surveys. Married respondents were asked to characterize their feelings about being married. The results are shown here in a crosstabulation with gender. Test the hypothesis that the proportion of females who report being very happily married is smaller than the proportion of males who report being very happily married. Use the p-value method with level of significance α=0.05.

marriage

Very happy Pretty happy/
Not too happy
Total
Female 257 166 423
Male 242 124 366
Total 499 290 789

Solution

From the crosstabulation, we assemble the statistics in Table 15 for the independent random samples of men and women.

Table 10.52: Table 15 Sample statistics of very happily married respondents
Sample size Number very
happy
Sample proportion very happy
Females (sample 1) n1=423 x1=257 ˆp1=x1n1=2574230.6076
Males (sample 2) n2=366 x2=242 ˆp2=x2n2=2574230.6612

We first check whether the conditions for the Z test are valid: x1=2575, (n1x1)=(423257)=1665, x2=2425, and (n2x2)=(366242)=1245. We can therefore proceed.

  • Step 1 State the hypotheses and the rejection rule.

    We are interested in whether the proportion of females who report being very happily married is smaller than that of males and because the females represent sample 1, the hypotheses are

    H0:p1=p2Ha:p1<p2

    Page 612

    where p1 and p2 represent the population proportions of all females and males, respectively, who report being very happily married. We will reject H0 if the pvalueα=0.05.

    image
    FIGURE 18 p-Value for left-tailed Z test.
  • Step 2 Find Zdata.

    First, use the data from Table 15 to find the value of ˆppooled.

    ˆppooled=x1+x2n1+n2=257+242423+3660.63245

    Then

    Zdata=(0.60760.6612)0.63245(10.63245)(1423+1366)1.56

  • Step 3 Find the p-value.

    Because it is a left-tailed test, the p-value is given by Table 14 as P(Z<Zdata)=P(Z<1.56), as shown in Figure 18. This amounts to a Case 1 problem from Table 8 in Chapter 6 on page 357:

    P(Z<1.56)=0.0594

  • Step 4 State the conclusion and the interpretation.

    The p-value=0.0594 is not less than or equal to α=0.05, so we do not reject H0. There is insufficient evidence that the proportion of females who report being very happily married is smaller than the proportion of males who do so.

Note: When the p-value is close to α, many data analysts prefer to simply assess the strength of evidence against the null hypothesis using criteria such as those given in Table 6 in Chapter 9 (page 514).

NOW YOU CAN DO

Exercises 9–12.

2 Independent Sample Z Interval for p1p2

We have learned how to perform Z tests for p1p2. Next, we learn how to use sample statistics to estimate p1p2 using a confidence interval.

Confidence Interval for p1p2

For two independent random samples taken from two populations with population proportions p1 and p2, a 100(1α)% confidence interval for p1p2 is given by

ˆp1ˆp2±Zα/2ˆp1ˆq1n1+ˆp2ˆq2n2

where ˆp1 and n1 represent the sample proportion and sample size of the sample taken from population 1 with population proportion p1; ˆp2 and n2 represent the sample proportion and sample size of the sample taken from population 2 with population proportion p2; ˆq1=1ˆp1 and ˆq2=1ˆp2, and the samples are drawn independently; and the following conditions are satisfied: x15, (n1x1)5, x25, and (n2x2)5.

Margin of Error E

The margin of error for a 100(1α)% confidence interval for p1p2 is given by

E=Zα/2ˆp1ˆq1n1+ˆp2ˆq2n2

EXAMPLE 17 Z confidence interval for p1p2

Use the sample statistics from Example 15 to do the following:

  1. Calculate and interpret the margin of error E for confidence level 99%.
  2. Construct and interpret a 99% confidence interval for p1p2.
Page 613

Solution

The conditions for the confidence interval are the same as for the hypothesis test and were checked in Example 15.

  1. ˆq1=1ˆp1=10.67=0.33ˆq2=1ˆp2=10.51=0.49.

    From Table 1 in Chapter 8 on page 432, the Zα/2 value for a 99% confidence level is 2.576. Therefore, the margin of error is

    E=Zα/2ˆp1ˆq1n1+ˆp2ˆq2n1=(2.576)(0.67)(0.33)500+(0.51)(0.49)5000.079

    The margin of error is 0.079, so we may estimate p1p2 to within 0.079 with 99% confidence.

  2. The point estimate is ˆp1ˆp2=0.670.51=0.16. The 99% confidence interval is therefore

    ˆp1ˆp2±E=0.16±0.079=(0.081,0.239)

    We are 99% confident that the difference in population proportions of males and females who agree that technology will lead to a better future lies between 0.081 and 0.239.

NOW YOU CAN DO

Exercises 13–18.

3 Use Z Confidence Intervals to Perform Z Tests for p1p2

Given a 100(1α)% Z confidence interval for p1p2, we may perform two-tailed Z tests for various hypothesized values of p1p2. If a proposed value lies outside the 100(1α)% Z confidence interval for p1p2, then the null hypothesis specifying this value would be rejected. Otherwise, do not reject the null hypothesis.

EXAMPLE 18 Using a Z interval for p1p2 to perform Z tests about p1p2

This example asks whether p1p2 differs from (or is not equal to) a certain value, so we can use the Z confidence interval to test the hypotheses. Example 17 provided a 99% Z confidence interval for p1p2, the difference in population proportions of males and females who agree that technology will lead to a better future, as (0.081, 0.239). Test, using level of significance α=0.01, whether the p1p2 differs from these values: (a) 0.1, (b) 0.2, (c) 0.3.

Solution

  1. H0:p1p2=0.1 versus Ha:p1p20.1.

    The hypothesized value 0.1 lies outside the interval (0.081, 0.239), so we reject H0.

  2. H0:p1p2=0.2 versus Ha:p1p20.2.

    The hypothesized value 0.2 lies inside the interval, so we do not reject H0.

  3. H0:p1p2=0.3 versus Ha:p1p20.3.

    The hypothesized value 0.3 lies outside the interval, so we reject H0.

NOW YOU CAN DO

Exercises 19–22.

[Leave] [Close]