We can decide in advance how much evidence against H0 we will insist on. The way to do this is to say, before any data are collected, how small a P-value we require. The decisive value of P is called the significance level. It is usual to write it as α, the Greek letter alpha. If we choose α = 0.05, we are requiring that the data give evidence against H0 so strong that it would happen no more than 5% of the time (one time in 20) when H0 is true. If we choose α = 0.01, we are insisting on stronger evidence against H0, evidence so strong that it would appear only 1% of the time (one time in 100) if H0 is, in fact, true.
529
Statistical significance
If the P-value is as small or smaller than α, we say that the data are statistically significant at level α.
“Significant’’ in the statistical sense does not mean “important.’’ It means simply “not likely to happen just by chance.’’ We used these words in Chapter 5 (page 103). Now we have attached a number to statistical significance to say what “not likely’’ means. You will often see significance at level 0.01 expressed by the statement, “The results were significant (P < 0.01).’’ Here, P stands for the P-value.
One traditional level of significance to use is 0.05. The origins of this appear to trace back to British statistician and geneticist Sir Ronald A. Fisher. Fisher once wrote that it was convenient to consider sample statistics that are two or more standard deviations away from the mean as being significant. Of course, we don’t have to make use of traditional levels of significance such as 5% and 1%. The P-value is more informative because it allows us to assess significance at any level we choose. For example, a result with P = 0.03 is significant at the α = 0.05 level but not significant at the α = 0.01 level. Nonetheless, the traditional significance levels are widely accepted guidelines for “how much evidence is enough.’’ We might say that P < 0.10 indicates “some evidence’’ against the null hypothesis, P < 0.05 is “moderate evidence,’’ and P < 0.01 is “strong evidence.’’ Don’t take these guidelines too literally, however. We will say more about interpreting tests in Chapter 23.