How do the mean, median, and standard deviation help describe a set of numbers?
To make sense of the data collected in a research study, we must have some way of summarizing the data and some way to determine the likelihood that observed patterns in the data are (or are not) simply the results of chance. The statistical procedures used for these purposes can be divided into two categories: (1) descriptive statistics, which are used to summarize sets of data, and (2) inferential statistics, which help researchers decide how confident they can be in judging that the results observed are not due to chance. We look briefly here at some commonly used descriptive statistics and then at the rationale behind inferential statistics. A more detailed discussion of some of these procedures, with examples, can be found in the Statistical Appendix at the back of this book.
Descriptive statistics include all numerical methods for summarizing a set of data. There are a number of relatively simple statistics that are commonly used to describe a set of data. These include the mean, median, and a measure of variability.
Describing a Set of Scores
If our data were a set of numerical measurements (such as ratings from 1 to 10 on how generous people were in their charitable giving), we might summarize these measurements by calculating either the mean or the median. The mean is simply the arithmetic average, determined by adding the scores and dividing the sum by the number of scores. The median is the center score, determined by ranking the scores from highest to lowest and finding the score that has the same number of scores above it as below it, that is, the score representing the 50th percentile. (The Statistical Appendix explains when the mean or the median is the more appropriate statistic.)
42
For certain kinds of comparisons, researchers need to describe not only the central tendency (the mean or median) but also the variability of a set of numbers. Variability refers to the degree to which the numbers in the set differ from one another and from their mean. In Table 2.1 you can see two sets of numbers that have identical means but different variabilities. In set A, the scores cluster close to the mean (low variability); in set B, they differ widely from the mean (high variability). A common measure of variability is the standard deviation, which is calculated by a formula described in the Statistical Appendix. As illustrated in Table 2.1, the further most individual scores are from the mean, the greater is the standard deviation.
Describing a Correlation
How does a correlation coefficient describe the direction and strength of a correlation? How can correlations be depicted in scatter plots?
Correlational studies, as discussed earlier in this chapter, examine two or more variables to determine whether or not a nonrandom relationship exists between them. When both variables are measured numerically, the strength and direction of the relationship can be assessed by a statistic called the correlation coefficient. Correlation coefficients are calculated by a formula (described in the Statistical Appendix) that produces a result ranging from −1.00 to +1.00. The sign (+ or −) indicates the direction of the correlation (positive or negative). In a positive correlation, an increase in one variable coincides with a tendency for the other variable to increase; in a negative correlation, an increase in one variable coincides with a tendency for the other variable to decrease. The absolute value of the correlation coefficient (the value with sign removed) indicates the strength of the correlation. To the degree that a correlation is strong (close to +1.00 or −1.00), you can predict the value of one variable by knowing the other. A correlation close to zero (0) means that the two variables are statistically unrelated—knowing the value of one variable does not help you predict the value of the other.
As an example, consider a hypothetical research study conducted with 10 students in a college course, aimed at assessing the correlation between the students’ most recent test score and each of four other variables: (1) the hours they spent studying for the test; (2) the score they got on the previous test; (3) their level of psychological depression, measured a day before the test; and (4) their height in centimeters. Suppose the data collected in the study are those depicted in the table on the next page in Figure 2.3. Each row in the table shows the data for a different student, and the students are rank ordered in accordance with their scores on the test.
To visualize the relationship between the test score and any of the other four variables, the researcher might produce a scatter plot, in which each student’s test score and that student’s value for one of the other measurements are designated by a single point on the graph. The scatter plots relating test score to each of the other variables are shown in Figure 2.3:
43
Why is it necessary to perform inferential statistics before drawing conclusions from the data in a research study?
Any set of data collected in a research study contains some degree of variability that can be attributed to chance. That is the essential reason why inferential statistics are necessary. In the experiment comparing treatments for depression summarized back in Figure 2.1, the average depression scores obtained for the four groups reflect not just the effects of treatment but also random effects caused by uncontrollable variables. For example, more patients who were predisposed to improve could by chance have been assigned to one treatment group rather than to another. Or measurement error stemming from imperfections in the rating procedure could have contributed to differences in the depression scores. If the experiment were repeated several times, the results would be somewhat different each time because of such uncontrollable random variables. Given that results can vary as a result of chance, how confident can a researcher be in inferring a general conclusion from the study’s data? Inferential statistics are ways of answering that question using the laws of probability.
44
Statistical Significance
When two groups of subjects in an experiment have different mean scores, the difference might be meaningful, or it might be just the result of chance. Similarly, a nonzero correlation coefficient in a correlational study might indicate a meaningful relationship between two variables—or it might be just the result of chance, such as flipping a coin 10 times and coming up with heads on seven of them. Seven is more than you would expect from chance if it’s a fair coin (that would be five), but with only 10 tosses you could have been “just lucky” in getting seven heads. If you got heads on 70 out of 100 flips (or 700 out of 1,000), you’d be justified in thinking that the coin is not “fair.” Inferential statistical methods, applied to either an experiment or a correlational study, are procedures for calculating the probability that the observed results could derive from chance alone.
What does it mean to say that a result from a research study is statistically significant at the 5 percent level?
Using such methods, researchers calculate a statistic referred to as p (for probability), or the level of significance. When two means are being compared, p is the probability that a difference as great as or greater than that observed would occur by chance if, in the larger population, there were no difference between the two means. (“Larger population” here means the entire set of scores that would be obtained if the experiment were repeated an infinite number of times with all possible subjects.) In other words, in the case of comparing two means in an experiment, p is the probability that a difference as large as or larger than that observed would occur if the independent variable had no real effect on the scores. In the case of a correlational study, p is the probability that a correlation coefficient as large as or larger than that observed (in absolute value) would occur by chance if, in the larger population, the two variables were truly uncorrelated. By convention, results are usually labeled as statistically significant if the value of p is less than .05(5 percent). To say that results are statistically significant is to say that the probability is acceptably small (generally less than 5 percent) that they could be caused by chance alone. All of the results of experiments and correlational studies discussed in this textbook are statistically significant at the .05 level or better.
The Components of a Test of Statistical Significance
How is statistical significance affected by the size of the effect, the number of subjects or observations, and the variability of the scores within each group?
The precise formulas used to calculate p values for various kinds of research studies are beyond the scope of this discussion, but it is worthwhile to think a bit about the elements that go into such calculations. They are:
In short, a large observed effect, a large number of observations, and a small degree of variability in scores within groups all reduce the likelihood that the effect is due to chance and increase the likelihood that a difference between two means, or a correlation between two variables, will be statistically significant.
Statistical significance tells us that a result probably did not come about by chance, but it does not, by itself, tell us that the result has practical value. Don’t confuse statistical significance with practical significance. If we were to test a new weight-loss drug in an experiment that compared 10,000 people taking the drug with a similar number not taking it, we might find a high degree of statistical significance even if the drug produced an average weight loss of only a few ounces. In that case, most people would agree that, despite the high statistical significance, the drug has no practical significance in a weight-loss program.
Researchers use statistics to analyze and interpret the results of their studies.
Descriptive Statistics
Inferential Statistics