nolanheinzen3e

10.1 The Paired-Samples t Test

As we learned in the chapter opening, researchers found that weight gain over the holidays is far less than what folk wisdom had suggested. Guess what? The dreaded “freshman 15” also appears to be a myth. One study found that male university students gained an average of 3.5 pounds between the beginning of the fall semester and November, and female students gained an average of 4.0 pounds (Holm-Denoma, Joiner, Vohs, & Heatherton, 2008). We use the paired-samples t test to make before-and-after comparisons.

The paired-samples t test is used to compare two means for a within-groups design, a situation in which every participant is in both samples; also called a dependent-samples t test.

The paired-samples t test (also called the dependent-samples t test) is used to compare two means for a within-groups design, a situation in which every participant is in both samples. The paired-samples t test can be used to analyze the data from many studies. For example, if a participant is in both conditions (such as a memory task before ingesting a caffeinated beverage and again after ingesting a non-caffeinated beverage), then her score in one depends on her score in the other.

The steps for the paired-samples t test are almost the same as those for the single-sample t test. The major difference in the paired-samples t test is that we must create difference scores for every participant. Because we’ll be working with difference scores, we need to learn about a new distribution—a distribution of the means of these difference scores, or a distribution of mean differences.

245

Before and After Many companies use before-and-after photos to encourage consumers to purchase their products. Statistics can help us overcome the persuasive powers of anecdotal evidence. We can use a paired-samples t test to compare weight before and after participation in an advertised program to determine if the mean difference is statistically significant.

Distributions of Mean Differences

We already learned about a distribution of scores and a distribution of means. Now we need to develop a distribution of mean differences for the pre- and post-holiday weight data. Our goal is to establish a distribution that specifies the null hypothesis for a within-groups design.

Imagine that many college students’ weights were measured before and after the winter holidays and written on individual cards. We begin by gathering data from a sample of three people from among this population of many college students. There are two cards for each person in the population, on which weights are listed—one before the holidays and one after the holidays. We have one pair of cards for each student in the population (which is why one name for this test is the paired-samples t test). Let’s walk through the steps to create a distribution of mean differences.

Step 1. Randomly choose three pairs of cards, replacing each pair of cards before randomly selecting the next.
Step 2. For each pair, calculate a difference score by subtracting the first weight from the second weight.
Step 3. Calculate the mean of the differences in weights for these three people. Then complete these three steps again. Randomly choose another three people from the population of many college students, calculate their difference scores, and calculate the mean of the three difference scores. And then complete these three steps again, and again, and again.

Let’s walk through these steps once again, using an example.

246

Step 1. We randomly select one pair of cards and find that the first student weighed 140 pounds before the holidays and 144 pounds after the holidays. We replace those cards and randomly select another pair; the second student had before and after scores of 126 and 124, respectively. We replace those cards and randomly select another pair; the third student had before and after scores of 168 and 168, respectively.
Step 2. For the first student, the difference between weights, subtracting the before score from the after score, is 144 − 140 = 4. For the second student, the difference between weights is 124 − 126 = −2. For the third student, the difference between weights is 168 − 168 = 0.
Step 3. The mean of these three difference scores (4, −2, 0) is 0.667.

We would then choose three more students and calculate the mean of their difference scores. Eventually, we would have many mean differences to plot on a curve of mean differences—some positive, some negative, and some right at 0.

But this would only be the beginning of what this distribution of mean differences would look like. If we were to calculate the whole distribution of mean differences, then we would do this an uncountable number of times. When the authors of this book calculated 30 mean differences for pairs of weights, we got the distribution in Figure 10-1. If no mean difference is found when comparing weights from before and after the holidays, as with the data we used to create Figure 10-1, the distribution would center around 0. According to the null hypothesis, we would expect no mean difference in weight—or a mean difference of 0—from before the holidays to after the holidays.

Figure 10-1

Creating a Distribution of Mean Differences This distribution is one of many that could be created by pulling 30 mean differences, the average of three differences between pairs of weights, pulled one at a time from a population of pairs of weights—one preholiday and one postholiday. The population used here is one based on the null hypothesis—that there is no average difference in weight from before the holidays to after the holidays.

The Six Steps of the Paired-Samples t Test

In a paired-samples t test, each participant has two scores—one in each condition. When we conduct a paired-samples t test, we write the pairs of scores in two columns, side by side next to the same participant. We then subtract each score in one column from its paired score in the other column to create difference scores. Ideally, a positive difference score indicates an increase, and a negative difference score indicates a decrease. Typically, we subtract the first score from the second so that the difference scores match this logic. We will now walk through the six steps of hypothesis testing for the paired-samples t test.

247

Large Monitors and Productivity Does a large monitor increase your productivity? Microsoft researchers and cognitive psychologists (Czerwinski et al., 2003) reported a 9% increase in productivity when research volunteers used a large 42-inch display versus a 15-inch display. Every participant used both displays and thus was in both samples. A paired-samples t test is the appropriate hypothesis test for this two-group design.

EXAMPLE 10.1

Let’s use an example from the software industry (which employs social scientists to improve the ways in which people interact with their products). For example, behavioral scientists at Microsoft studied how 15 volunteers performed on a set of tasks under two conditions—while using a 15-inch computer monitor and while using a 42-inch monitor (Czerwinski et al., 2003), the latter of which allows the user to have multiple programs in view at the same time.

Here are five participants’ fictional data, which reflect the actual means reported by researchers. Note that a smaller number is good—it indicates a faster time. The first person completed the tasks on the small monitor in 122 seconds and on the large monitor in 111 seconds; the second person in 131 and 116; the third in 127 and 113; the fourth in 123 and 119; and the fifth in 132 and 121.

STEP 1: Identify the populations, distribution, and assumptions.

The paired-samples t test is like the single-sample t test in that we analyze a single sample of scores. For the paired-samples t test, however, we analyze difference scores. For the paired-samples t test, one population is reflected by each condition, but the comparison distribution is a distribution of mean difference scores (rather than a distribution of means). The comparison distribution is based on the null hypothesis that posits no mean difference. So the mean of the comparison distribution is 0. For the paired-samples t test, the three assumptions are the same as for the single-sample t test.

MASTERING THE CONCEPT

10.2: The steps for the paired-samples t test are similar to those for the single-sample t test. The main difference is that for the paired-samples t test, we are comparing the sample mean difference between scores to the mean difference for the population according to the null hypothesis, rather than comparing the sample mean of individual scores to the population mean according to the null hypothesis, as we do when conducting a single-sample t test.

Summary: Population 1: People performing tasks using a 15-inch monitor. Population 2: People performing tasks using a 42-inch monitor.

The comparison distribution is a distribution of mean difference scores based on the null hypothesis. The hypothesis test is a paired-samples t test because we have two samples of scores and a within-groups design.

This study meets one of the three assumptions and may meet the other two: (1) The dependent variable is time, which is scale. (2) The participants were not randomly selected, however, so we must be cautious with respect to generalizing our findings. (3) We do not know whether the population is normally distributed, and there are not at least 30 participants. However, the data from this sample do not suggest a skewed distribution.

248

STEP 2: State the null and research hypotheses.

This step is identical to that for the single-sample t test.

Summary: Null hypothesis: People who use a 15-inch screen will complete a set of tasks in the same amount of time, on average, as people who use a 42-inch screen—H₀: μ₁ = μ₂. Research hypothesis: People who use a 15-inch screen will complete a set of tasks in a different amount of time, on average, than people who use a 42-inch screen—H₁: μ₁ ≠ μ₂.

STEP 3: Determine the characteristics of the comparison distribution.

This step is similar to that for the single-sample t test. We determine the appropriate mean and standard error of the comparison distribution—the distribution based on the null hypothesis. With the paired-samples t test, however, we have a sample of difference scores and a comparison distribution of mean differences (instead of a sample of individual scores and a comparison distribution of means). According to the null hypothesis, there is no difference. So the mean of the comparison distribution is always 0, as long as the null hypothesis posits no difference.

For the paired-samples t test, standard error is calculated exactly as it is calculated for the single-sample t test, only we use the difference scores rather than the scores in each condition. To get the difference scores in the current example, we want to know what happens when we go from the control condition (small screen) to the experimental condition (large screen), so we subtract the first score from the second score. This means that a negative difference indicates a decrease in time when the screen goes from small to large. (The test statistic will be the same if we reverse the order in which we subtract, but the sign will change.)

Summary: μ_M = 0; s_M = 1.924

Calculations: (Notice that we crossed out the original scores once we created the column of difference scores. We did this to remind ourselves that all remaining calculations involve the differences scores, not the original scores.)

The mean of the difference scores is:

M_difference = −11

The numerator is the sum of square, SS (that we learned about in Chapter 4):

SS = 0 + 16 + 9 + 49 + 0 = 74

The standard deviation, s, is:

249

The standard error, s_M, is:

STEP 4: Determine the critical values, or cutoffs.

This step is the same as that for the single-sample t test, except that the degrees of freedom is the number of participants (not the number of scores) minus 1.

Summary: df = N − 1 = 5 − 1 = 4

The critical values, based on a two-tailed test and a p level of 0.05, are −2.776 and 2.776, as seen in the curve in Figure 10-2.

Figure 10-2

Determining Cutoffs for a Paired-Samples t Test We typically determine critical values in terms of t statistics rather than means of raw scores so that we can easily determine whether the test statistic is beyond one of the cutoffs.

STEP 5: Calculate the test statistic.

This step is identical to that for the single-sample t test, except that we use means of difference scores instead of means of individual scores. We subtract the mean difference score according to the null hypothesis, 0, from the mean difference score calculated for the sample. We then divide by standard error.

Summary:

STEP 6: Make a decision.

This step is identical to that for the single-sample t test.

Summary: Reject the null hypothesis. When we examine the means (M_X = 127; M_Y = 116), it appears that, on average, people perform faster when using a 42-inch monitor than when using a 15-inch monitor (as shown by the curve in Figure 10-3).

Figure 10-3

Making a Decision To decide whether to reject the null hypothesis, we compare the test statistic to the critical values. In this figure, the test statistic, −5.72, is beyond the cutoff of −2.776, so we can reject the null hypothesis.

The statistics, as reported in a journal article, follow the same APA format as for a single-sample t test. (Note: Unless we use software, we can only indicate whether the p value is less than or greater than the cutoff p level of 0.05.) In the current example, the statistics would read:

t(4) = −5.72, p < 0.05

250

We also include the means and the standard deviations for the two samples. We calculated the means in step 6 of hypothesis testing, but we would also have to calculate the standard deviations for the two samples to report them.

The researchers note that the faster time with the large display might not seem much faster but that, in their research, they have had great difficulty identifying any factors that lead to faster times (Czerwinski et al., 2003). Based on their previous research, therefore, this is an impressive difference.

CHECK YOUR LEARNING

Reviewing the Concepts

The paired-samples t test is used when we have data for all participants under two conditions—a within-groups design.
In the paired-samples t test, we calculate a difference score for every individual in the study. The statistic is calculated on those difference scores.
We use the same six steps of hypothesis testing that we used with the z test and with the single-sample t test.

Clarifying the Concepts

10-1 How do we conduct a paired-samples t test?
10-2 Explain what an individual difference score is, as it is used in a paired-samples t test.

Calculating the Statistics

10-3 Below are energy-level data (on a scale of 1 to 7, where 1 = feeling of no energy and 7 = feeling of high energy) for five students before and after lunch. Calculate the mean difference for these people so that loss of energy is a negative value. Assume you are testing the hypothesis that students go into what we call “food comas” after eating, versus lunch giving them added energy.

Before lunch	After lunch
6	3
5	2
4	6
5	4
7	5

Applying the Concepts

10-4 Using the energy-level data presented in Check Your Learning 10-3, test the hypothesis that students have different energy levels before and after lunch. Perform the six steps of hypothesis testing for a two-tailed paired-samples t test.

Solutions to these Check Your Learning questions can be found in Appendix D.