Chapter 102. Statistical Significance

Learning Objectives

central tendency
the typical, or representative, score in a distribution, often referred to as the "average"
confidence interval
a range of scores calculated such that there is a specific probability (usually .95) that the value of interest (such as the estimated mean of a population) actually falls within that range
control group
participants in an experiment who do not receive the <i>treatment</i>
correlation
a way of measuring the relationship between two variables
correlation coefficient (*r*)
a statistic that indicates the precise numerical relationship between two variables; <em>r</em> can range from -1.0 to +1.0
descriptive statistics
numbers that are calculated from a distribution of scores, indicating the central tendency (average) and the variability (amount of scatter around the average)
distribution
arrangement of scores from a variable, showing their observed frequency of occurrence
effect size
a calculated number that indicates the size of a difference between two values; not affected by sample size
error bars
lines that indicate the amount of variability or uncertainty around a point on a graph of research results
experiment
a method of research that manipulates an independent variable to measure its effect on a dependent variable
experimental group
participants in an experiment who receive the <i>treatment</i> level of the independent variable
hypothesis
a testable prediction, typically derived from a theory
inferential statistics
numbers that are calculated from a distribution of scores to provide evidence supporting or opposing a hypothesis
mean
a measure of central tendency calculated by adding all scores and then dividing by the number of scores
null hypothesis
a statistical assumption about the absence of an effect (typically, no difference between two values)
null hypothesis significance testing (NHST)
an approach to evaluating research results that compares the observed outcome to what would be expected if the null hypothesis is true
*p*-level
the probability of finding a difference that is equal to or greater than what was actually measured, assuming that the null hypothesis is true; also called probability level
population
a group of people (or animals) whose behavior is of interest to researchers; from this group, one or more samples are selected for measurement
random assignment
in an experiment, assigning participants to experimental and control conditions by chance, to minimize preexisting differences between the different groups
sample
a set of measurements from a group of people (or animals) selected from a larger population of interest
standard deviation
a measure of variability, indicating how tightly the scores are clustered around the mean
statistic
a calculated number that summarizes important information about a distribution of scores
statistical significance
whether a research result differs sufficiently from what would be expected from chance alone, due to random variations in behavior
variable
anything that can vary, or take different values
Statistical Significance
true
true
true
asset/activities/stat_significance/images/n08un01.svg
Learning Objectives:

Describe the role of statistical significance in psychological research.

Compare null hypothesis significance testing with the use of effect size and confidence intervals.

Review

concept_review

Review

Select the NEXT button to continue with the Review.

asset/activities/stat_significance/images/08UN02.svg

1. After researchers have collected data, they typically calculate descriptive statistics, such as the mean and standard deviation. These statistics provide a summary of the central tendency and variability of each distribution of scores. Next, they calculate inferential statistics to determine whether the results support their hypotheses about behavior. For example, if researchers are comparing an experimental group with a control group, the descriptive statistics indicate whether or not the experimental treatment changed performance, and the inferential statistics indicate how much confidence the researchers should have in the results.

Review

concept_review

Review

Select the NEXT button to continue with the Review.

asset/activities/stat_significance/images/08UN03a.svg
asset/activities/stat_significance/images/08UN03b.svg

2. The questions that inferential statistics can answer about a hypothesis usually involve a comparison. Did one group of participants perform differently than another group? On average, did participants perform differently in one condition of an experiment than in other conditions? Does the correlation coefficient for two variables differ from zero? (A correlation of zero indicates no relationship.)

Review

concept_review

Review

Select the NEXT button to continue with the Review.

asset/activities/stat_significance/images/n08un04.svg

3. If researchers find a difference among groups of participants, they want to know whether or not it is a “real” difference—a reliable, repeatable result—rather than a fluke due to random errors in measurement, random fluctuations in people’s behavior, or failure of random assignment to control for preexisting differences between the groups. Roughly speaking, we consider a result to be statistically significant if there is a low probability that the result was due to chance factors.

Review

concept_review

Review

Select the NEXT button to continue with the Review.

asset/activities/stat_significance/images/n-08un05.svg

4. One standard method of determining statistical significance is to pose a null hypothesis—typically claiming that there is no difference—and then testing the likelihood that the observed results would have occurred if the null hypothesis were true. This method is called null hypothesis significance testing (NHST). For example, if researchers are comparing an experimental group to a control group, the null hypothesis is that the two groups do not differ. Assuming that the null hypothesis is true, the likelihood of obtaining results equal to or greater than the observed results is called the probability level, or p-level. Typically, if the p-level is less than 0.05 (5 percent), the result is considered statistically significant.

Review

concept_review

Review

Select the NEXT button to continue with the Review.

asset/activities/stat_significance/images/n-08un06.svg

5. There is a long-standing controversy within psychology as to whether the NHST approach is an appropriate way to evaluate research results. The p-level takes into account the size of the sample; the larger the sample, the easier it is to obtain a statistically significant result (with p < .05), even with a very small difference between groups. Also, the p-level is frequently misinterpreted as indicating the probability that the null hypothesis is true. That is actually backwards. Given 100 percent probability that the null hypothesis is true, the p-level indicates the probability of obtaining a difference equal to or greater than the difference measured (observed in the results).

Review

concept_review

Review

Select the NEXT button to continue with the Review.

asset/activities/stat_significance/images/08UN07.svg

6. In addition to calculating the p-level, researchers usually calculate the effect size of a difference, which indicates the magnitude of the difference in a way that is not influenced by the size of the sample. Instead of using the NHST approach, some researchers prefer to calculate the confidence interval (margin of error) for each statistic, and then use those numbers to evaluate the reliability and importance of the finding. Researchers typically calculate the confidence interval for a mean in a way that provides a 95 percent probability that the actual mean of the population falls within the interval.

Review

concept_review

Review

Select the NEXT button to continue with the Review.

asset/activities/stat_significance/images/08UN08.svg

7. On a graph of the research results, the mean scores often have error bars indicating the 95 percent confidence intervals. This allows the viewer to estimate the likelihood that the observed difference is a meaningful difference. In the left graph, the confidence intervals for females and males overlap, suggesting that the difference in mean scores is not reliable. In the right graph, the non-overlapping confidence intervals suggest that the measured age gap in performance could be a real difference.

Review

concept_review

Review

Select the NEXT button to continue with the Review.

asset/activities/stat_significance/images/08UN09.svg

8. Finally, it is important to remember that a statistically significant result (with p < .05) is not necessarily an important result, even if the p-level is very small (such as p < .01 or p < .001). A small difference may be reliable and repeatable, but not have much impact on people’s behavior. For example, in the study illustrated in this graph, headache sufferers who had a pain level of 10 experienced a 40 percent decrease in pain after taking one aspirin tablet. Those who took two tablets experienced a 46 percent improvement. The difference was statistically significant, but most people would not notice the difference in their daily life.

Practice 1: Exploring Research Statistics

hover_review
true

Practice 1: Exploring Research Statistics

Roll over each term about statistics to see a brief description in the context of interpreting research results.

population

sample

mean

standard deviation

correlation coefficient (r)

descriptive statistics

inferential statistics

statistical significance

null hypothesis

p-level

null hypothesis significance testing (NHST)

confidence interval

effect size

Description:

a group of people (or animals) whose behavior is of interest to researchers; from this group, one or more samples are selected for measurement

a group of people (or animals) whose behavior is measured; this group is drawn from a larger population, and the sample results are usually generalized to the population

a measure of central tendency calculated by adding all scores and then dividing by the number of scores

a measure of variability, indicating how tightly the scores are clustered around the mean

a statistic that indicates the precise numerical relationship between two variables; r can range from -1.0 to +1.0

numbers calculated from a distribution of scores, indicating the central tendency (average) and the variability (amount of scatter around the average)

numbers calculated from a distribution of scores to provide evidence supporting or opposing an hypothesis

whether a research result differs sufficiently from what would be expected from chance alone, due to random variations in behavior

a statistical assumption about the absence of an effect (no difference between two values)

probability of finding a difference that is equal to or greater than what was actually measured, assuming that the null hypothesis is true

way of evaluating results by comparing the observed outcome to what would be expected if the null hypothesis is true

a range of scores calculated such that there is a specific probability (usually .95) that the value of interest actually falls within that range

way of measuring the strength of a result, yielding a number that indicates the difference between two values; not affected by sample size

Practice 2: Interpreting Research Results

hover_review
true

Practice 2: Interpreting Research Results

Roll over each statement to see whether the conclusion is accurate or unjustified.

If a statistical test yields a p-level of .12, researchers using the null hypothesis significance testing (NHST) approach would accept the result as a statistically significant effect.

If a statistical test yields a p-level of .02, researchers using the null hypothesis significance testing (NHST) approach could conclude that the results are statistically significant, because there is only a 2 percent probability that the outcome was due to chance.

After the mean performance for two different groups has been calculated, if the 95 percent confidence intervals for those two means do not overlap, researchers could conclude that the difference between the groups was probably a reliable effect—a real difference.

FALSE. Researchers who use NHST would look for a p-level of less than .05 before they would claim statistical significance.

FALSE. Researchers who use NHST would conclude that the results were statistically significant, because the probability of getting that large a difference (or a larger difference) if the null hypothesis were true would only be 2 percent.

TRUE. Researchers who use confidence intervals would be delighted to have no overlap, because that would indicate that the means of the two groups were reliably different.

Quiz 1

matching_test

Quiz 1

Match the terms to their descriptions by dragging each colored circle to the appropriate gray circle. When all the circles have been placed, select the CHECK ANSWER button.

Select the NEXT button and move to Quiz 2.
Perhaps you should go back to review statistics used in the context of interpreting research results.
effect size
null hypothesis
confidence interval
descriptive statistics
inferential statistics
statistical significance
p-level
a range of scores calculated such that there is a specific probability (usually .95) that the value of interest actually falls within that range
numbers calculated from a distribution of scores, indicating the central tendency and the variability
a calculated number that indicates the size of a difference between two values; not affected by sample size
numbers calculated from a distribution of scores to provide evidence supporting or opposing a hypothesis
statistical assumption about the absence of an effect (typically, no difference between two values)
probability of finding a difference that is equal to or greater than what was actually measured, assuming that the null hypothesis is true
whether a research result differs sufficiently from what would be expected from chance alone

Quiz 2

radio_quiz_upd
1,0,0,1,1,0,0,1
Select the NEXT button and move to the Conclusion.
Try to respond to the statements again.

Quiz 2

For each statement, select one of the buttons to indicate whether the statement is True or False. When responses have been chosen for all the statements, select the CHECK ANSWER button.

True False

Researchers consider a result to be statistically significant if there is a low probability that the outcome was due to chance.

A p-level of less than .05 indicates that the probability of the outcome being due to chance is less than 5 percent.

Many researchers do not believe that null hypothesis significance testing (NHST) is an appropriate way to evaluate research outcomes.

If the mean score for Group A is at least 2 or 3 points higher than the mean of Group B, we can conclude that the difference between the groups is statistically significant.

Conclusion

end_slide
asset/activities/stat_significance/images/n08un01.svg
Congratulations!
You have completed the activity Title