10.2 Beyond Hypothesis Testing

The APA encourages the use of confidence intervals and effect sizes (as with the z test and the single-sample t test) for paired-samples t tests. We’ll calculate both the confidence interval and the effect size for the example of productivity with small versus large computer monitors.

MASTERING THE CONCEPT

10.3: As we can with a z test and a single-sample t test, we can calculate a confidence interval and an effect size for a paired-samples t test.

251

Calculating a Confidence Interval for a Paired-Samples t Test

Let’s start by determining the confidence interval for the productivity example.

EXAMPLE 10.2

First, let’s recap the information we need. The population mean difference according to the null hypothesis was 0, and we used the sample to estimate the population standard deviation to be 4.301 and the standard error to be 1.924. The five participants in the study sample had a mean difference of −11. We will calculate the 95% confidence interval around the sample mean difference of −11.

STEP 1: Draw a picture of a t distribution that includes the confidence interval.

We draw a normal curve (Figure 10-4) that has the sample mean difference, −11, at its center instead of the population mean difference, 0.

Figure 10-4

A 95% Confidence Interval for a Paired-Samples t Test, Part I We start the confidence interval for a distribution of mean differences by drawing a curve with the sample mean difference, −11, in the center.

STEP 2: Indicate the bounds of the confidence interval on the drawing.

As before, 47.5% fall on each side of the mean between the mean and the cutoff, and 2.5% fall in each tail.

STEP 3: Add the critical t statistics to the curve.

For a two-tailed test with a p level of 0.05 and 4 df, the critical values are −2.776 and 2.776, as seen in Figure 10-5.

Figure 10-5

A 95% Confidence Interval for a Paired-Samples t Test, Part II The next step in calculating a confidence interval for mean differences is identifying the t statistics that indicate each end of the interval. Because the curve is symmetric, the t statistics have the same magnitude—one is negative, −2.776, and one is positive, 2.776.

STEP 4: Convert the critical t statistics back into raw mean differences.

As we do with other confidence intervals, we use the sample mean difference (−11) in the calculations and the standard error (1.924) as the measure of spread. We use the same formulas as for the single-sample t test, recalling that these means and standard errors are calculated from differences between two scores for each participant. We add these raw mean differences to the curve in Figure 10-6.

252

Figure 10-6

A 95% Confidence Interval for a Paired-Samples t Test, Part III The final step in calculating a confidence interval for mean differences is converting the t statistics that indicate each end of the interval to raw mean differences, −16.34 and −5.66.

MASTERING THE FORMULA

10-1: The formula for the lower bound of a confidence interval for a paired-samples t test is Mlower = −t(sM) + Msample. The formula for the upper bound of a confidence interval for a paired-samples t test is Mupper = t(sM) + Msample. These are the same as for a single-sample t test, but remember that the means and standard errors are calculated from differences between pairs of scores, not individual scores.

The 95% confidence interval, reported in brackets as is typical, is [−16.34, −5.66].

STEP 5: Verify that the confidence interval makes sense.

The sample mean difference should fall exactly in the middle of the two ends of the interval.

−11 − (−16.34) = 5.34 and −11 − (−5.66) = −5.34

We have a match. The confidence interval ranges from 5.34 below the sample mean difference to 5.34 above the sample mean difference. If we were to sample five people from the same population over and over, the 95% confidence interval would include the population mean 95% of the time. Note that the population mean difference according to the null hypothesis, 0, does not fall within this interval. This means it is not plausible that the difference between those using the 15-inch monitor and those using the 42-inch monitor is 0.

As with other hypothesis tests, the conclusions from both the paired-samples t test and the confidence interval are the same, but the confidence interval gives us more information—an interval estimate, not just a point estimate.

Calculating Effect Size for a Paired-Samples t Test

As with a z test, we can calculate the effect size (Cohen’s d) for a paired-samples t test.

EXAMPLE 10.3

Let’s calculate the effect size for the computer monitor study. Again, we simply use the formula for the t statistic, substituting s for sM (and μ for μM, even though these means are always the same). This means we use 4.301 instead of 1.924 in the denominator. Cohen’s d is now based on the spread of the distribution of individual differences between scores, rather than the distribution of mean differences.

MASTERING THE FORMULA

10-2: The formula for Cohen’s d for a paired-samples t statistic is: Cohen’s It is the same formula as for the single-sample t statistic, except that the mean and standard deviation are for difference scores rather than individual scores.

The effect size, d = −2.56, tells us that the sample mean difference and the population mean difference are 2.56 standard deviations apart. This is a large effect. Recall that the sign has no effect on the size of an effect: −2.56 and 2.56 are equivalent effect sizes. We can add the effect size when we report the statistics as follows: t(4) = −5.72, p < 0.05, d = −2.56.

253

Next Steps

Order Effects and Counterbalancing

Order effects refer to how a participant’s behavior changes when the dependent variable is presented for a second time, sometimes called practice effects.

There are particular problems that can occur with a within-groups design (such as the paired-samples t test). Specifically, a within-groups design invites a particular kind of confounding variable into a study: order effects. Order effects refer to how a participant’s behavior changes when the dependent variable is presented for a second time. (They’re sometimes called practice effects.) Let’s consider the computer monitor study for which we conducted a paired-samples t test. Remember that the participants completed a series of tasks on a 15-inch computer monitor and also on a 42-inch computer monitor. The time it took them to complete the series of tasks was recorded under each condition. Can you spot the confound? Participants were likely to get faster the second time they completed the tasks. Their responses “the second time around” would be influenced by the practice of already having completed the tasks once.

Counterbalancing minimizes order effects by varying the order of presentation of different levels of the independent variable from one participant to the next.

Fortunately, we can limit the confounding influence of order effects. Counterbalancing minimizes order effects by varying the order of presentation of different levels of the independent variable from one participant to the next. For example, half of the participants could be randomly assigned to complete the tasks on the 15-inch monitor first, then again on the 42-inch monitor. The other half could be randomly assigned to complete the tasks on the 42-inch monitor first, then again on the 15-inch monitor. In this case, any practice effect would be washed out by varying the order of the monitors.

There are other ways to reduce order effects. In the computer monitor example, we might decide to use a different set of tasks in each testing condition. The order in which the two different sets of tasks are given could be counterbalanced along with the order in which participants are assigned to the two different-sized monitors. Measures such as this can reduce order effects in within-groups research designs.

Order Effects You observe that your friends felt exhilarated after riding a roller coaster without loops (which turn riders upside-down), then felt nauseated after riding a roller coaster with loops. You conclude that loops lead to nausea. The problem is that there could be an order effect. Perhaps your friends would have felt nauseated after the second roller coaster ride whether or not it had loops. Counterbalancing would avoid this confound. Half of your friends would be randomly assigned to ride the one without loops first, then the one with loops; half of them would be randomly assigned to ride the one with loops first, then the one without loops. (Your second author gets queasy just thinking about roller coasters and would not participate in this experiment no matter what experimental controls were in place! But your first author would volunteer!)
© ImageState/Alamy
© Amazing Images/Alamy

254

CHECK YOUR LEARNING

Reviewing the Concepts

  • We can calculate a confidence interval for a paired-samples t test. This provides us with an interval estimate rather than simply a point estimate. If 0 is not in the confidence interval, then it is not plausible that there is no difference between the sample and population mean differences.
  • We also can calculate an effect size (Cohen’s d) for a paired-samples t test.
  • Order effects occur when participants’ behavior is affected when a dependent variable is presented a second time.
  • Order effects can be reduced through counterbalancing, a procedure in which the different levels of the independent variable are presented in different orders from one participant to the next.

Clarifying the Concepts

  • 10-5 How does creating a confidence interval for a paired-samples t test give us the same information as hypothesis testing with a paired-samples t test?
  • 10-6 How do we calculate Cohen’s d for a paired-samples t test?

Calculating the Statistics

  • 10-7 Assume that researchers asked five participants to rate their mood on a scale from 1 to 7 (1 being lowest, 7 being highest) before and after watching a funny video clip. The researchers reported that the average difference between the “before” mood score and the “after” mood score was M = 1.0, s = 1.225. They calculated a paired-samples t test, t(4) = 1.13, p > 0.05 and, using a two-tailed test with a p level of 0.05, failed to reject the null hypothesis.
    1. Calculate the 95% confidence interval for this t test and describe how it results in the same conclusion as the hypothesis test.
    2. Calculate and interpret Cohen’s d.

Applying the Concepts

  • 10-8 Using the energy-level data presented in Check Your Learning 10-3 and 10-4, let’s go beyond hypothesis testing.
    1. Calculate the 95% confidence interval and describe how it results in the same conclusion as the hypothesis test.
    2. Calculate and interpret Cohen’s d.

Solutions to these Check Your Learning questions can be found in Appendix D.