Processing math: 55%

9.3 Z Test for the Population Mean: p-Value Method

This page includes Video Technology Manuals
This page includes Statistical Videos

OBJECTIVES By the end of this section, I will be able to …

  1. Perform the Z test for the mean, using the p-value method.
  2. Assess the strength of evidence against the null hypothesis.
  3. Describe the relationship between the p-value method and the critical-value method.
  4. Use the Z confidence interval for the mean to perform the two-tailed Z test for the mean.

1 The p-Value Method of Performing the Z Test for the Mean

In Section 9.2, we considered the critical-value method for performing the Z test, which works by comparing one Z-value (Zdata) with another Z-value (Zcrit). In this section, we introduce the p-value method, which works by comparing one probability (the p-value) to another probability (α). The two methods are equivalent for the same level of significance α, giving you the same conclusion.

Page 508

The p-value is a measure of how well (or how poorly) the data fit the null hypothesis.

p-Value

The p-value is the probability of observing a sample statistic (such as ˉx or Zdata) at least as extreme as the statistic actually observed if we assume that the null hypothesis is true.

Roughly speaking, the p-value represents the probability of observing the sample statistic if the null hypothesis is true. The term p-value means “probability value,” so its value must always lie between 0 and 1.

A p-value is a probability associated with Zdata and tells us whether or not Zdata is an extreme value. The method for calculating p-values depends on the form of the hypothesis test (Table 5).

  1. For a right-tailed test, the p-value is in the right (or upper) tail area.
  2. For a left-tailed test, the p-value is in the left (or lower) tail area.
  3. For a two-tailed test, the p-value lies in both tails.

Remember that probability is represented by the area under the curve.

Table 9.10: Table 5 Finding the p-value depends on the form of the hypothesis test
Type of hypothesis test Right-tailed test Left-tailed test Two-tailed test
Hypotheses H0:μ=μ0Ha:μ>μ0 H0:μ=μ0Ha:μ<μ0 H0:μ=μ0Ha:μμ0
p-Value is tail area associated with Zdata p-value=P(Z>Zdata)
Area to right of Zdata
p-value=P(Z<Zdata)
Area to left of Zdata
p-value=P(Z>|Zdata|)+P(Z<-|Zdata|)=2·P(Z>|Zdata|)
Sum of the two tail areas
image

EXAMPLE 12 Finding the p-value

For each of the following hypothesis tests, calculate and graph the p-value.

  1. H0:μ=3.0versusHa:μ>3.0,Zdata=1
  2. H0:μ=10versusHa:μ<10,Zdata=-1.45
  3. H0:μ=100versusHa:μ100,Zdata=-2

To review how to calculate these probabilities, see Table 8 in Chapter 6 on page 355.

Solution

  1. We have a right-tailed test, so that the p-value equals the area in the right tail:

    p-value=P(Z>Zdata)=P(Z>1)

    The Z table gives the probability for P(Z<1). Thus,

    p-value=P(Z>1)=1-P(Z<1)=1-0.8413=0.1587 (Figure 6a)

  2. We have a left-tailed test, so that the p-value equals the area in the left tail:

    p-value=P(Z<Zdata)=P(Z<-1.45)=0.0735 (Figure 6b)

    Page 509
  3. Here, we have a two-tailed test, so that the p-value equals the sum of the areas in the two tails:

    p-value=P(Z>|Zdata|)+(Z<-|Zdata|)=P(Z>|-2|)+(Z<-|-2|)=P(Z>2)+(Z<-2)=0.0228+0.0228=0.0456(Figure6c)

    image
    FIGURE 6a p-Value for a right-tailed test.
    image
    FIGURE 6b p-Value for a left-tailed test.
    image
    FIGURE 6c p-Value for a two-tailed test.

NOW YOU CAN DO

Exercises 7–20.

YOUR TURN#5

For each of the following hypothesis tests, calculate and graph the p-value.

  1. H0:μ=75versusHa:μ>75,Zdata=0.5
  2. H0:μ=50versusHa:μ<50,Zdata=-1.2
  3. H0:μ=1versusHa:μ1,Zdata=-0.1

(The solutions are shown in Appendix A.)

The p-Value applet allows you to experiment with various hypotheses, means, standard deviations, and sample sizes in order to see how changes in these values affect the p-value.

A p-value is based on the value of Zdata, so the p-value tells us whether or not Zdata is an extreme value. Unusual and extreme values of ˉx, and therefore of Zdata, will have a small p-value, whereas values of ˉx and Zdata nearer to the center of the distribution will have a large p-value.

Assuming H0 is true:

Unusual and extreme values of ˉx and Zdata Small p-value
(close to 0; see Figure 6c)
Values of ˉx and Zdata near center Large p-value
(greater than, say, 0.15;
see Figure 6a)

A small p-value indicates a conflict between your sample data and the null hypothesis, and will thus lead us to reject H0. However, how small is small? We learned in Section 9.1 that the probability of Type I error α is chosen by the researcher to be small, usually 0.01, 0.05, or 0.10. Thus, a p-value is small if it is ≤ α. This leads us to the rejection rule that tells us when we may reject the null hypothesis.

Rejection Rule When Using p-Value Method

The rejection rule for performing a hypothesis test using the p-value method is:

Reject H0 when the p-valueα. Otherwise, do not reject H0.

This rejection rule can be applied to any type of hypothesis test we perform in Chapters 914 using the p-value method.

The value of α represents the boundary between results that are statistically significant (where we reject H0) and results that are not statistically significant (where we do not reject H0). Thus, α is called the level of significance of the hypothesis test.

Here are the steps for performing the Z test for μ using the p-value method.

Page 510

Z Test for the Population Mean μ: p-Value Method

When a random sample of size n is taken from a population where the standard deviation σ is known, you can use the Z test if either (a) the population is normal, or (b) the sample size is large (n30).

  • Step 1 State the hypotheses and the rejection rule.

    Use one of the forms from Table 5 to write the hypotheses. State the meaning of μ. The rejection rule is “Reject H0 if the p-value ≤ α.”

  • Step 2 Calculate Zdata.

    Zdata=ˉx-μ0σ/n

    where the sample mean ˉx and the sample size n represent the sample data, and the population standard deviation σ represents the population data.

  • Step 3 Find the p-value.

    Either use technology to find the p-value, or calculate it using the form in Table 5 that corresponds to your hypotheses.

  • Step 4 State the conclusion and interpretation.

    If the p-value ≤ α, then reject H0. Otherwise do not reject H0. Interpret your conclusion so that a nonspecialist (someone who has not had a course in statistics) can understand, as follows:

  • Interpretation when you reject H0: There is evidence at level of significance α that [whatever Ha says].
  • Interpretation when you do not reject H0: There is insufficient evidence at level of significance α that [whatever Ha says].

EXAMPLE 13 The Z test for the mean using the p-value method: One-tailed test

FlightStats.com compiles user ratings for airports worldwide. The mean rating for JFK International Airport in New York for July 2014 was 3.0 (out of 5). Assume that the population standard deviation of user ratings is known to be σ=1. A random sample taken this year of n=36 user ratings for JFK Airport showed a mean of ˉx=2.75. Using level of significance α=0.05, test whether the population mean user rating for JFK Airport has fallen since 2014.

Solution

The sample size n=36 is large, and the population standard deviation σ is known. We may therefore perform the Z test for the mean.

  • Step 1 State the hypotheses and the rejection rule.

    The key words here are “has fallen,” which means “is less than.” The answer to the question “Less than what?” gives us μ0=3.0. Thus, our hypotheses are

    H0:μ=3.0versusHa:μ<3.0

    where μ refers to the population mean user rating for JFK Airport. We will reject H0 if the p-valueα=0.05.

  • Step 2 Calculate Zdata.

    We have ˉx=2.75,μ0=3.0,n=36,, and σ=1. Thus, our test statistic is

    Zdata=ˉx-μ0σ/n=2.75-3.01/36=-1.5

    Page 511
  • Step 3 Find the p-value.

    Our hypotheses represent a left-tailed test from Table 5. Thus,

    p-value=P(Z<Zdata)=P(Z<-1.5)

    This is a Case 1 problem from Table 8 in Chapter 6 (page 355). The Z table (Appendix Table C) provides us with the area to the left of Z=-1.5 (Figure 7):

    P(Z<-1.5)=0.0668

    Thus, the p-value is 0.0668.

    image
    FIGURE 7 The p-value 0.0668 is not ≤ 0.05, so do not reject H0.
  • Step 4 State the conclusion and interpretation.

    Our level of significance is α=0.05 (from Step 1). The p-value=0.0668 is not ≤ 0.05, therefore, we do not reject H0. There is insufficient evidence at the level of significance α=0.05 that the population mean user rating for JFK Airport is less than 3.0.

NOW YOU CAN DO

Exercises 21–26.

image What If Scenario

What if the sample mean in Example 13 was not ˉx=2.75 but was instead some unknown value smaller than ˉx=2.75? All other statistics and parameters remain the same. Suppose we wanted to perform the same hypothesis test as in Example 13. How would this decrease in the value of ˉx affect the following, if at all?

  1. Zdata
  2. p-value
  3. The conclusion

Solution

  1. In Example 13, ˉx=2.75 is smaller than μ0=3.0, which is why Zdata is negative. If we decrease ˉx to an even smaller value, this will move Zdata further into negative territory (leftward on the number line).
  2. For a left-tailed test, the p-value is the area to the left of Zdata. So, if Zdata is further to the left, there is less area to the left of it. Thus, the new p-value will be smaller.
  3. We know from (b) that the p-value is decreasing, but not by how much, because we don't know how much smaller ˉx and Zdata are. If the p-value decreases just a little bit, it will still be greater than α=0.05, and so we will still not reject H0. However, if the p-value decreases by a lot, it will then be less than α=0.05, and so then we will reject H0. Without further information, we just don't know.
Page 512

EXAMPLE 14 The p-value method using technology: Two-tailed test

brisbane

The birth weights, in grams (1000 grams = 1 kilogram ≈ 2.2 pounds), of a random sample of 44 babies from Brisbane, Australia, have a sample mean weight ˉx=3276 grams. Formerly, the mean birth weight of babies in Brisbane was 3200 grams. Assume that the population standard deviation σ=528 grams. Is there evidence that the population mean birth weight of Brisbane babies now differs from 3200 grams? Use technology to perform the appropriate hypothesis test, with level of significance α=0.10.

What Results Might We Expect?

Note from Figure 8 that the sample mean birth weight ˉx=3276 grams is close to the hypothesized mean birth weight of μ0=3200 grams. This value of ˉx is not extreme and thus does not seem to offer strong evidence that the hypothesized mean birth weight is wrong. Therefore, we might expect to not reject the hypothesis that μ0=3200 grams.

image
FIGURE 8 Sample mean, ˉx=3276, is close to hypothesized mean, μ0=3200, so we expect to not reject the null hypothesis.

Solution

The sample size n=44 is large and σ=528 is known, so we may proceed with the Z test for μ.

  • Step 1 State the hypotheses and the rejection rule.

    The key words “differs from” mean that we have a two-tailed test:

    H0:μ=3200versusHa:μ3200

    where μ refers to the population mean birth weight of Brisbane babies. We will reject H0 if the p-valueα=0.10.

  • Step 2 Calculate Zdata.

    We will use the instructions provided in the Step-by-Step Technology Guide at the end of this section (page 519). Figure 9 shows the TI-83/84 results from the Z test for μ:

    FIGURE 9 TI-83/84 results.
    image
    Page 513

    Zdata=ˉx-μ0σ/n=3276-3200528/44=0.95478592450.9548

Figure 10 shows the Minitab results, where

  • “Test of μ=3200 versus ≠ 3200” refers to the hypotheses being tested, H0:μ=3200 versus Ha:μ3200.
  • “The assumed standard deviation = 528” refers to our assumption that σ=528.
  • SE Mean refers to the standard error of the mean, that is, σ/n. You can see that 528/4479.6.
  • 90% CI represents a 90% Z confidence interval for μ.
  • Z refers to our test statistic:

    Zdata=ˉx-μ0σ/n=(3276-3200)/(528/44)=0.95478592450.95

  • P represents our p-value of 0.340.
    FIGURE 10 Minitab results.
    image

Different software rounds the results to different numbers of decimal places.

Figure 11 shows the JMP results, where

image
FIGURE 11 JMP results.
  • “Hypothesized Value” refers to the hypotheses being tested: H0:μ=3200 versus Ha:μ3200.
  • “Actual Estimate” refers to the sample mean, ˉx=3276.
  • “Sigma given” refers to our assumption that σ=528.
  • “Test statistic” refers to Zdata, our test statistic.
  • Prob>|z|,” “Prob>z,” and “Prob<z” refers to the p-value of a two-sided, right-tailed, and left-tailed test, respectively. We want the two-sided p-value, which is Prob>|z|=0.3397.
  • Step 3 Find the p-value.

    We have a two-tailed test from Step 1, so that from Table 5 our p-value is (Figure 12)

    p-Value=2·P(Z>|Zdata|)=2·P(Z>0.9548)2·(0.1698)=0.3396

    FIGURE 12 p-Value is the sum of two tail areas: 0.1698+0.1698=0.3396.
    image
    Page 514
  • Step 4 State the conclusion and interpretation.

    Because 0.3396 is not ≤ 0.10, we do not reject H0. There is insufficient evidence that the population mean birth weight differs from 3200 grams. This conclusion is just as we expected.

NOW YOU CAN DO

Exercises 27–30.

2 Assessing the Strength of Evidence Against the Null Hypothesis

The hypothesis-testing methods we have shown so far deliver a simple “yes-or-no” conclusion: either “Reject H0” or “Do not reject H0.” There is no indication of how strong the evidence is for rejecting the null hypothesis. Was the decision close? Was it a no-brainer? On the other hand, the p-value itself represents the strength of evidence against the null hypothesis. There is extra information here, which we should not ignore.

For instance, we can directly compare the results of hypothesis tests. Suppose that we have two hypothesis tests that both result in not rejecting the null hypothesis, with level of significance α=0.05. However, Test A has a p-value of 0.06, whereas Test B has a p-value of 0.57. Clearly, Test A came very close to rejecting the null hypothesis and shows a fair amount of evidence against the null hypothesis, whereas Test B shows no evidence at all against the null hypothesis. A simple statement of the “yes-or-no” conclusion misses the clear distinction between these two situations.

The p-value provides us with the smallest level of significance at which the null hypothesis would be rejected, that is, the smallest value of α at which the results would be considered significant.

Of course, we are free to determine whether the results are significant using whatever α level we want. For example, Test A would have rejected H0 for any α value 0.06 or higher. Some data analysts in fact do not think in terms of rejecting or not rejecting the null hypothesis. Rather, they think completely in terms of assessing the strength of evidence against the null hypothesis.

For many (though not all) data domains, Table 6 provides a thumbnail impression of the strength of evidence against the null hypothesis for various p-values. For certain domains (such as the physical sciences), however, alternative interpretations are appropriate.

Table 9.12: Table 6 Strength of evidence against the null hypothesis for various levels of p-value
p-Value Strength of evidence against H0
p-value0.001 Extremely strong evidence
0.001<p-value0.01 Very strong evidence
0.01<p-value0.05 Solid evidence
0.05<p-value0.10 Moderate evidence
0.10<p-value0.15 Slight evidence
0.15<p-value No evidence

Note: Use Table 6 for all exercises that ask for an assessment of the strength of evidence against the null hypothesis.

EXAMPLE 15 Assessing the strength of evidence against H0

Assess the strength of evidence against H0 shown by the p-values in (a) Example 13 and (b) Example 14.

Solution

  1. In Example 13, we tested H0:μ=3.0 versus Ha:μ<3.0, where μ refers to the population mean user rating for JFK International Airport. Our p-value of 0.0668 implies that there is moderate evidence against the null hypothesis that the population mean user rating for JFK Airport equals 3.0.

    Page 515
  2. In Example 14, we tested H0:μ=3200 versus Ha:μ3200, where μ refers to the population mean birth weight of Brisbane babies (in grams). Our p-value of 0.3397 implies that there is no evidence against the null hypothesis that the population mean birth weight of Brisbane babies equals 3200 grams.

NOW YOU CAN DO

Exercises 31–40.

YOUR TURN#6

Each of the following p-values was calculated in Example 12. For each, assess the strength of evidence against the null hypothesis.

  1. H0:μ=3.0versusHa:μ>3.0,p-value=0.1587
  2. H0:μ=10versusHa:μ<10,p-value=0.0735
  3. H0:μ=100versusHa:μ100,p-value=0.0456

(The solutions are shown in Appendix A.)

Developing Your Statistical Sense

The Role of the Level of Significance α

Suppose that in Example 13, our level of significance α was 0.10 instead of 0.05. Would this have changed anything? Certainly. Our p-value of 0.0668 is less than the new α=0.10, so we would reject H0. Think about that for a moment. The data haven't changed at all, but our conclusion is reversed simply by changing α. What is a data analyst to make of a situation like this? Two alternatives are available.

  1. We don't want the choice of a to dictate our conclusion, so perhaps we should turn to a direct assessment of the strength of evidence against the null hypothesis, as provided in Table 6. In this case, the p-value of about 0.0668 would offer moderate evidence against the null hypothesis, regardless of the value of α.
  2. Obtain more data, perhaps through a call for further research.

3 The Relationship Between the p-Value Method and the Critical-Value Method

Figure 13 shows the relationships between the p-value method and the critical-value method. The top half represents values of Z and the critical-value method that we studied in Section 9.2. The bottom half represents probabilities and the p-value method that we studied in this section. The left half represents statistics associated with the observed sample data. The right half represents critical-value thresholds for significance to which these statistics are compared.

Because Zdata helps us to determine the p-value, these two values are related. Similarly, because the level of significance α helps to determine the value of Zcrit, these two values are related. Moreover, just as we compare Zdata with the threshold Zcrit, we compare the p-value statistic with the α threshold to determine significance. Thus, the two methods for conducting hypothesis tests are equivalent and, in fact, are quite thoroughly interwoven.

Figures 14a and 14b illustrate this equivalence for a right-tailed test. The rejection rule for the p-value method is to reject H0 when the p-valueα. The rejection rule for the critical-value method is to reject H0 when ZdataZcrit. Note in Figures 14a and 14b how the p-value is determined by Zdata, and Zcrit is determined by α. In Figure 14a, when Zdata<Zcrit, it must also happen that the p-valueα. In both cases we do not reject H0. However, in Figure 14b, when ZdataZcrit, it also follows that the p-value isα. In both cases, we reject H0. Thus, the p-value method and the critical-value method are equivalent.

Page 516
image
FIGURE 13 Critical-value method and p-value method are equivalent.
image
FIGURE 14a For a right-tailed test, Zdata<Zcrit only when the p-value>α.
image
FIGURE 14b For a right-tailed test, ZdataZcrit only when the p-valueα.

4 Using Confidence Intervals for μ to Perform Two-Tailed Hypothesis Tests About μ

Consider a two-tailed hypothesis test for μ:

H0:μ=μ0versusHa:μμ0

and recall the 100(1-α)% Z confidence interval for μ from Section 8.1:

x¯±Zα/2(σ/n)

Both inference methods are based on the Z statistic:

Z=x¯-μσ/n

Page 517

so it makes sense that the two-tailed hypothesis test and the confidence interval are equivalent.

Equivalence of a Two-Tailed Hypothesis Test and a Confidence Interval

  • If a certain value for μ0 lies outside the corresponding 100(1-α)%Z confidence interval for μ, then the null hypothesis specifying this value for μ0 would be rejected for level of significance α (see Figure 15).
  • Alternatively, if a certain value for μ0 lies inside the 100(1-α)%Z confidence interval for μ, then the null hypothesis specifying this value for μ0 would not be rejected for level of significance α.
image
FIGURE 15 Reject H0 for values of μ0 that lie outside confidence interval (a, b).

Table 7 shows the confidence levels and associated α levels of significance that will produce the equivalent inference.

Table 9.13: Table 7 Confidence levels for equivalent α levels of significance
Confidence level Level of significance α
90% 0.10
95% 0.05
99% 0.01

We may thus use a single confidence interval to test as many values of μ0 as necessary.

EXAMPLE 16 Equivalence of two-tailed tests and confidence intervals

image

Recall Example 4 from Section 8.1 (page 432), where we were 90% confident using a Z interval that the population mean score on the 2014 SAT Math test lies between 471.2 and 548.8. Test, using level of significance α=0.10, whether the population mean SAT Math test score differs from these values: (a) 470, (b) 510, (c) 550.

Solution

Once we have the 90% confidence interval, we may test as many possible values for μ0 as necessary, as long as we use level of significance α=0.10 (see Table 7).

  • If any values of μ0 lie inside the confidence interval, that is, between 471.2 and 548.8, we will not reject H0 for this value of μ0.
  • If any values of μ0 lie outside the confidence interval, that is, either to the left of 471.2 or to the right of 548.8, we will reject H0, as shown in Figure 16.
    FIGURE 16 Reject H0 for values of μ0 that lie outside (471.2, 548.8).
    image

We set up the three two-tailed hypothesis tests as follows:

  1. H0:μ=470versusHa:μ470
  2. H0:μ=510versusHa:μ510
  3. H0:μ=550versusHa:μ550
Page 518

To perform each hypothesis test, simply observe where each value of μ0 falls on the number line shown in Figure 16. For example, in the first hypothesis test, the hypothesized value μ0=470 lies outside the interval (471.2, 548.8). Thus, we reject H0. The three hypothesis tests are summarized here.

Value of μ0 Form of hypothesis test,
with α=0.10
Where μ0 lies in
relation to 90%
confidence interval
Conclusion of
hypothesis test
a. 470 H0:μ=470vs.Ha:μ470 Outside Reject H0
b. 510 H0:μ=510vs.Ha:μ510 Inside Do not reject H0
c. 550 H0:μ=550vs.Ha:μ550 Outside Reject H0

NOW YOU CAN DO

Exercises 41–46.

YOUR TURN#7

For the Z interval from Example 16, test, using level of significance α=0.10, whether the population mean SAT Math test score differs from these values: (a) 548, (b) 477, (c) 549.

(The solutions are shown in Appendix A.)

Increasingly, technology is being used to perform statistical analysis, including hypothesis tests. Therefore, it is important to know how to read and interpret the software output from a hypothesis test.

EXAMPLE 17 Interpreting software output

Each of (a) and (b) represent software output from a Z test for μ. For each, examine the indicated software output, and provide the following steps:

  • Step 1 State the hypotheses and the rejection rule.
  • Step 2 Calculate Zdata.
  • Step 3 Find the p-value.
  • Step 4 State the conclusion and interpretation.

Let the level of significance be α=0.05 in each case.

  1. TI-83/84 output for a Z test for μ, where μ represents the population mean length of laboratory mice (in cm)
  2. Minitab output for a Z test for μ, where μ represents the population mean number of farmer's markets per county, nationwide.
    image
    TI-83/84 output for part (a).
    image
    Minitab output for part (b).

Solution

  1. Interpreting the TI-83/84 output.

    • Step 1 State the hypotheses and the rejection rule.

      In the TI-83/84 output, the “μ>10” indicates the alternative hypothesis. In other words, the hypotheses are:

      H0:μ=10versusHa:μ>10

      where μ represents the population mean length of laboratory mice (in cm). We will reject H0 if the p-value is less than the level of significance α=0.05.

      Page 519
    • Step 2 Find Zdata.

      The “z=1” in the TI-83/84 output provides us the value of the test statistic, Zdata=1.

    • Step 3 Find the p-value.

      The “p=.1586552596” in the TI-83/84 output represents the p-value.

    • Step 4 State the conclusion and interpretation.

      The p-value from Step 3 is not less than the level of significance α=0.05, so we do not reject H0. There is insufficient evidence that the population mean length of laboratory mice is greater than 10 cm.

  2. Interpreting the Minitab output.
    • Step 1 State the hypotheses and the rejection rule.

      The first line in the Minitab output is “Test of μ=2.3vs2.3,” which indicates a two-tailed test, as follows:

      H0:μ=2.3versusHa:μ2.3

      where μ represents the population mean number of farmer's markets per county. We will reject H0 if the p-value is less than the level of significance α=0.05.

    • Step 2 Find Zdata

      Under “Z” in the Minitab output is found “2.91,” giving us Zdata=2.91.

    • Step 3 Find the p-value.

      Under “p” in the Minitab output is “0.004,” representing our p-value.

    • Step 4 State the conclusion and interpretation.

      The p-value (0.004) from Step 3 is less than the level of significance α=0.05, so we reject H0.There is evidence that the population mean number of farmer's markets per county, nationwide, differs from 2.3.

[Leave] [Close]