OBJECTIVES By the end of this section, I will be able to …
1 The -Value Method of Performing the Test for the Mean
In Section 9.2, we considered the critical-value method for performing the test, which works by comparing one -value () with another Z-value (). In this section, we introduce the -value method, which works by comparing one probability (the -value) to another probability (). The two methods are equivalent for the same level of significance , giving you the same conclusion.
508
The -value is a measure of how well (or how poorly) the data fit the null hypothesis.
-Value
The -value is the probability of observing a sample statistic (such as or ) at least as extreme as the statistic actually observed if we assume that the null hypothesis is true.
Roughly speaking, the -value represents the probability of observing the sample statistic if the null hypothesis is true. The term -value means “probability value,” so its value must always lie between 0 and 1.
A -value is a probability associated with and tells us whether or not is an extreme value. The method for calculating -values depends on the form of the hypothesis test (Table 5).
Remember that probability is represented by the area under the curve.
Type of hypothesis test | Right-tailed test | Left-tailed test | Two-tailed test |
---|---|---|---|
Hypotheses | |||
-Value is tail area associated with |
Area to right of |
Area to left of |
Sum of the two tail areas |
EXAMPLE 12 Finding the -value
For each of the following hypothesis tests, calculate and graph the -value.
To review how to calculate these probabilities, see Table 8 in Chapter 6 on page 355.
Solution
We have a right-tailed test, so that the -value equals the area in the right tail:
The table gives the probability for . Thus,
(Figure 6a)
We have a left-tailed test, so that the -value equals the area in the left tail:
(Figure 6b)
509
Here, we have a two-tailed test, so that the -value equals the sum of the areas in the two tails:
NOW YOU CAN DO
Exercises 7–20.
YOUR TURN #5
For each of the following hypothesis tests, calculate and graph the -value.
(The solutions are shown in Appendix A.)
The -Value applet allows you to experiment with various hypotheses, means, standard deviations, and sample sizes in order to see how changes in these values affect the -value.
A -value is based on the value of , so the -value tells us whether or not is an extreme value. Unusual and extreme values of , and therefore of , will have a small -value, whereas values of and nearer to the center of the distribution will have a large -value.
Assuming is true:
Unusual and extreme values of and | ↔ | Small -value (close to 0; see Figure 6c) |
Values of and near center | ↔ | Large -value (greater than, say, 0.15; see Figure 6a) |
A small -value indicates a conflict between your sample data and the null hypothesis, and will thus lead us to reject . However, how small is small? We learned in Section 9.1 that the probability of Type I error is chosen by the researcher to be small, usually 0.01, 0.05, or 0.10. Thus, a -value is small if it is ≤ . This leads us to the rejection rule that tells us when we may reject the null hypothesis.
Rejection Rule When Using -Value Method
The rejection rule for performing a hypothesis test using the -value method is:
Reject when the -value ≤ . Otherwise, do not reject .
This rejection rule can be applied to any type of hypothesis test we perform in Chapters 9–14 using the -value method.
The value of represents the boundary between results that are statistically significant (where we reject ) and results that are not statistically significant (where we do not reject ). Thus, is called the level of significance of the hypothesis test.
Here are the steps for performing the test for using the -value method.
510
Test for the Population Mean : -Value Method
When a random sample of size is taken from a population where the standard deviation σ is known, you can use the test if either (a) the population is normal, or (b) the sample size is large .
Step 1 State the hypotheses and the rejection rule.
Use one of the forms from Table 5 to write the hypotheses. State the meaning of . The rejection rule is “Reject if the -value ≤ .”
Step 2 Calculate .
where the sample mean and the sample size represent the sample data, and the population standard deviation σ represents the population data.
Step 3 Find the -value.
Either use technology to find the -value, or calculate it using the form in Table 5 that corresponds to your hypotheses.
Step 4 State the conclusion and interpretation.
If the -value ≤ , then reject . Otherwise do not reject . Interpret your conclusion so that a nonspecialist (someone who has not had a course in statistics) can understand, as follows:
EXAMPLE 13 The test for the mean using the -value method: One-tailed test
FlightStats.com compiles user ratings for airports worldwide. The mean rating for JFK International Airport in New York for July 2014 was 3.0 (out of 5). Assume that the population standard deviation of user ratings is known to be . A random sample taken this year of user ratings for JFK Airport showed a mean of . Using level of significance , test whether the population mean user rating for JFK Airport has fallen since 2014.
Solution
The sample size is large, and the population standard deviation is known. We may therefore perform the test for the mean.
Step 1 State the hypotheses and the rejection rule.
The key words here are “has fallen,” which means “is less than.” The answer to the question “Less than what?” gives us . Thus, our hypotheses are
where refers to the population mean user rating for JFK Airport. We will reject if the .
Step 2 Calculate .
We have , and . Thus, our test statistic is
511
Step 3 Find the -value.
Our hypotheses represent a left-tailed test from Table 5. Thus,
This is a Case 1 problem from Table 8 in Chapter 6 (page 355). The table (Appendix Table C) provides us with the area to the left of (Figure 7):
Thus, the -value is 0.0668.
Step 4 State the conclusion and interpretation.
Our level of significance is (from Step 1). The is not ≤ 0.05, therefore, we do not reject . There is insufficient evidence at the level of significance that the population mean user rating for JFK Airport is less than 3.0.
NOW YOU CAN DO
Exercises 21–26.
What If Scenario
What if the sample mean in Example 13 was not but was instead some unknown value smaller than ? All other statistics and parameters remain the same. Suppose we wanted to perform the same hypothesis test as in Example 13. How would this decrease in the value of affect the following, if at all?
Solution
512
EXAMPLE 14 The -value method using technology: Two-tailed test
brisbane
The birth weights, in grams (1000 grams = 1 kilogram ≈ 2.2 pounds), of a random sample of 44 babies from Brisbane, Australia, have a sample mean weight . Formerly, the mean birth weight of babies in Brisbane was 3200 grams. Assume that the population standard deviation grams. Is there evidence that the population mean birth weight of Brisbane babies now differs from 3200 grams? Use technology to perform the appropriate hypothesis test, with level of significance .
What Results Might We Expect?
Note from Figure 8 that the sample mean birth weight grams is close to the hypothesized mean birth weight of grams. This value of is not extreme and thus does not seem to offer strong evidence that the hypothesized mean birth weight is wrong. Therefore, we might expect to not reject the hypothesis that grams.
Solution
The sample size is large and is known, so we may proceed with the test for .
Step 1 State the hypotheses and the rejection rule.
The key words “differs from” mean that we have a two-tailed test:
where refers to the population mean birth weight of Brisbane babies. We will reject if the .
Step 2 Calculate .
We will use the instructions provided in the Step-by-Step Technology Guide at the end of this section (page 519). Figure 9 shows the TI-83/84 results from the test for :
513
Figure 10 shows the Minitab results, where
refers to our test statistic:
Different software rounds the results to different numbers of decimal places.
Figure 11 shows the JMP results, where
Step 3 Find the -value.
We have a two-tailed test from Step 1, so that from Table 5 our -value is (Figure 12)
514
Step 4 State the conclusion and interpretation.
Because 0.3396 is not ≤ 0.10, we do not reject . There is insufficient evidence that the population mean birth weight differs from 3200 grams. This conclusion is just as we expected.
NOW YOU CAN DO
Exercises 27–30.
2 Assessing the Strength of Evidence Against the Null Hypothesis
The hypothesis-testing methods we have shown so far deliver a simple “yes-or-no” conclusion: either “Reject ” or “Do not reject .” There is no indication of how strong the evidence is for rejecting the null hypothesis. Was the decision close? Was it a no-brainer? On the other hand, the -value itself represents the strength of evidence against the null hypothesis. There is extra information here, which we should not ignore.
For instance, we can directly compare the results of hypothesis tests. Suppose that we have two hypothesis tests that both result in not rejecting the null hypothesis, with level of significance . However, Test A has a -value of 0.06, whereas Test B has a -value of 0.57. Clearly, Test A came very close to rejecting the null hypothesis and shows a fair amount of evidence against the null hypothesis, whereas Test B shows no evidence at all against the null hypothesis. A simple statement of the “yes-or-no” conclusion misses the clear distinction between these two situations.
The -value provides us with the smallest level of significance at which the null hypothesis would be rejected, that is, the smallest value of at which the results would be considered significant.
Of course, we are free to determine whether the results are significant using whatever level we want. For example, Test A would have rejected for any value 0.06 or higher. Some data analysts in fact do not think in terms of rejecting or not rejecting the null hypothesis. Rather, they think completely in terms of assessing the strength of evidence against the null hypothesis.
For many (though not all) data domains, Table 6 provides a thumbnail impression of the strength of evidence against the null hypothesis for various -values. For certain domains (such as the physical sciences), however, alternative interpretations are appropriate.
-Value | Strength of evidence against |
---|---|
Extremely strong evidence | |
Very strong evidence | |
Solid evidence | |
Moderate evidence | |
Slight evidence | |
No evidence |
Note: Use Table 6 for all exercises that ask for an assessment of the strength of evidence against the null hypothesis.
EXAMPLE 15 Assessing the strength of evidence against
Assess the strength of evidence against shown by the -values in (a) Example 13 and (b) Example 14.
Solution
In Example 13, we tested versus , where refers to the population mean user rating for JFK International Airport. Our -value of 0.0668 implies that there is moderate evidence against the null hypothesis that the population mean user rating for JFK Airport equals 3.0.
515
NOW YOU CAN DO
Exercises 31–40.
YOUR TURN #6
Each of the following -values was calculated in Example 12. For each, assess the strength of evidence against the null hypothesis.
(The solutions are shown in Appendix A.)
Developing Your Statistical Sense
The Role of the Level of Significance
Suppose that in Example 13, our level of significance was 0.10 instead of 0.05. Would this have changed anything? Certainly. Our -value of 0.0668 is less than the new , so we would reject . Think about that for a moment. The data haven't changed at all, but our conclusion is reversed simply by changing . What is a data analyst to make of a situation like this? Two alternatives are available.
3 The Relationship Between the -Value Method and the Critical-Value Method
Figure 13 shows the relationships between the -value method and the critical-value method. The top half represents values of and the critical-value method that we studied in Section 9.2. The bottom half represents probabilities and the -value method that we studied in this section. The left half represents statistics associated with the observed sample data. The right half represents critical-value thresholds for significance to which these statistics are compared.
Because helps us to determine the -value, these two values are related. Similarly, because the level of significance helps to determine the value of , these two values are related. Moreover, just as we compare with the threshold , we compare the -value statistic with the threshold to determine significance. Thus, the two methods for conducting hypothesis tests are equivalent and, in fact, are quite thoroughly interwoven.
Figures 14a and 14b illustrate this equivalence for a right-tailed test. The rejection rule for the -value method is to reject when the . The rejection rule for the critical-value method is to reject when . Note in Figures 14a and 14b how the -value is determined by , and is determined by . In Figure 14a, when , it must also happen that the . In both cases we do not reject . However, in Figure 14b, when , it also follows that the . In both cases, we reject . Thus, the -value method and the critical-value method are equivalent.
516
4 Using Confidence Intervals for to Perform Two-Tailed Hypothesis Tests About
Consider a two-tailed hypothesis test for :
and recall the confidence interval for from Section 8.1:
Both inference methods are based on the statistic:
517
so it makes sense that the two-tailed hypothesis test and the confidence interval are equivalent.
Equivalence of a Two-Tailed Hypothesis Test and a Confidence Interval
Table 7 shows the confidence levels and associated levels of significance that will produce the equivalent inference.
Confidence level | Level of significance |
---|---|
90% | 0.10 |
95% | 0.05 |
99% | 0.01 |
We may thus use a single confidence interval to test as many values of as necessary.
EXAMPLE 16 Equivalence of two-tailed tests and confidence intervals
Recall Example 4 from Section 8.1 (page 432), where we were 90% confident using a interval that the population mean score on the 2014 SAT Math test lies between 471.2 and 548.8. Test, using level of significance , whether the population mean SAT Math test score differs from these values: (a) 470, (b) 510, (c) 550.
Solution
Once we have the 90% confidence interval, we may test as many possible values for as necessary, as long as we use level of significance (see Table 7).
We set up the three two-tailed hypothesis tests as follows:
518
To perform each hypothesis test, simply observe where each value of falls on the number line shown in Figure 16. For example, in the first hypothesis test, the hypothesized value lies outside the interval (471.2, 548.8). Thus, we reject . The three hypothesis tests are summarized here.
Value of | Form of hypothesis test, with |
Where lies in relation to 90% confidence interval |
Conclusion of hypothesis test |
---|---|---|---|
a. 470 | Outside | Reject | |
b. 510 | Inside | Do not reject | |
c. 550 | Outside | Reject |
NOW YOU CAN DO
Exercises 41–46.
YOUR TURN #7
For the Z interval from Example 16, test, using level of significance , whether the population mean SAT Math test score differs from these values: (a) 548, (b) 477, (c) 549.
(The solutions are shown in Appendix A.)
Increasingly, technology is being used to perform statistical analysis, including hypothesis tests. Therefore, it is important to know how to read and interpret the software output from a hypothesis test.
EXAMPLE 17 Interpreting software output
Each of (a) and (b) represent software output from a Z test for . For each, examine the indicated software output, and provide the following steps:
Let the level of significance be in each case.
Solution
Interpreting the TI-83/84 output.
Step 1 State the hypotheses and the rejection rule.
In the TI-83/84 output, the “” indicates the alternative hypothesis. In other words, the hypotheses are:
where represents the population mean length of laboratory mice (in cm). We will reject if the -value is less than the level of significance .
519
Step 2 Find .
The “” in the TI-83/84 output provides us the value of the test statistic, .
Step 3 Find the -value.
The “” in the TI-83/84 output represents the -value.
Step 4 State the conclusion and interpretation.
The -value from Step 3 is not less than the level of significance , so we do not reject . There is insufficient evidence that the population mean length of laboratory mice is greater than 10 cm.
Step 1 State the hypotheses and the rejection rule.
The first line in the Minitab output is “Test of ,” which indicates a two-tailed test, as follows:
where represents the population mean number of farmer's markets per county. We will reject if the -value is less than the level of significance .
Step 2 Find
Under “” in the Minitab output is found “2.91,” giving us .
Step 3 Find the -value.
Under “” in the Minitab output is “0.004,” representing our -value.
Step 4 State the conclusion and interpretation.
The -value (0.004) from Step 3 is less than the level of significance , so we reject .There is evidence that the population mean number of farmer's markets per county, nationwide, differs from 2.3.