Exercises

Clarifying the Concepts

Question 9.1

9.1

When should we use a t distribution?

Question 9.2

9.2

Why do we modify the formula for calculating standard deviation when using t tests (and divide by N − 1)?

Question 9.3

9.3

How is the calculation of standard error different for a t test than for a z test?

Question 9.4

9.4

Explain why the standard error for the distribution of sample means is smaller than the standard deviation of sample scores.

239

Question 9.5

9.5

Define the symbols in the formula for the t statistic: image

Question 9.6

9.6

When is it appropriate to use a single-sample t test?

Question 9.7

9.7

What does the phrase “free to vary,” referring to a number of scores in a given sample, mean for statisticians?

Question 9.8

9.8

How are the critical t values affected by sample size and degrees of freedom?

Question 9.9

9.9

Why do the t distributions merge with the z distribution as sample size increases?

Question 9.10

9.10

Explain what each part of the following statistical phrase means, as it would be reported in APA format: t(4) = 2.87, p = 0.032.

Question 9.11

9.11

What do we mean when we say we have a distribution of mean differences?

Question 9.12

9.12

When do we use a paired-samples t test?

Question 9.13

9.13

Explain the distinction between the terms independent samples and paired samples as they relate to t tests.

Question 9.14

9.14

How is a paired-samples t test similar to a single-sample t test?

Question 9.15

9.15

How is a paired-samples t test different from a single-sample t test?

Question 9.16

9.16

Why is the population mean almost always equal to 0 for the null hypothesis in the two-tailed, paired-samples t test?

Question 9.17

9.17

If we calculate the confidence interval around the sample mean difference used for a paired-samples t test, and it includes the value of 0, what can we conclude?

Question 9.18

9.18

If we calculate the confidence interval around the sample mean difference used for a paired-samples t test, and it does not include the value of 0, what can we conclude?

Question 9.19

9.19

Why is a confidence interval more useful than a single-sample t test or a paired-samples t test?

Question 9.20

9.20

What is the appropriate effect size for a paired-samples t test? How is the calculation different from the effect size for a single-sample t test?

Question 9.21

9.21

For a paired-samples t test, how is a Cohen’s d of 0.5 interpreted, according to Cohen’s conventions?

Calculating the Statistics

Question 9.22

9.22

We use formulas to describe calculations. Find the error in each of the following formulas. Explain why each is incorrect and provide a correction.

  1. image

  2. image

Question 9.23

9.23

For the data 93, 97, 91, 88, 103, 94, 97, calculate the standard deviation under both of these conditions:

  1. For this sample

  2. As an estimate of the population

  3. Calculate the standard error for t using symbolic notation.

  4. Calculate the t statistic, assuming µ = 96.

Question 9.24

9.24

For the data 1.01, 0.99, 1.12, 1.27, 0.82, 1.04, calculate the standard deviation under both of the following conditions. (Note: You will have to carry some calculations out to the third decimal place to see the difference in calculations.)

  1. For the sample

  2. As an estimate of the population

  3. Calculate the standard error for t using symbolic notation.

  4. Calculate the t statistic, assuming μ = 0.96.

Question 9.25

9.25

Identify the critical t value in each of the following circumstances:

  1. One-tailed test, df = 73, p level of 0.10

  2. Two-tailed test, df = 108, p level of 0.05

  3. One-tailed test, df = 38, p level of 0.01

Question 9.26

9.26

Calculate degrees of freedom and identify the critical t value for a single-sample t test in each of the following circumstances:

  1. Two-tailed test, N = 8, p level of 0.10

  2. One-tailed test, N = 42, p level of 0.05

  3. Two-tailed test, N = 89, p level of 0.01

Question 9.27

9.27

Identify the critical t values for each of the following tests:

  1. A single-sample t test examining scores for 26 participants to see if there is any difference compared to the population, using a p level of 0.05

  2. A one-tailed, single-sample t test performed on scores on the Marital Satisfaction Inventory for 18 people who went through marriage counseling, as compared to the population of people who had not been through marital counseling, using a p level of 0.01

  3. A two-tailed, single-sample t test, using a p level of 0.05, with 34 degrees of freedom

Question 9.28

9.28

Assume we know the following for a two-tailed, single-sample t test, at a p level of 0.05: µ = 44.3, N = 114, M = 43, s = 5.9.

  1. Calculate the t statistic.

  2. Calculate a 95% confidence interval.

  3. Calculate the effect size using Cohen’s d.

240

Question 9.29

9.29

Assume we know the following for a two-tailed, single-sample t test: µ = 7, N = 41, M = 8.5, s = 2.1.

  1. Calculate the t statistic.

  2. Calculate a 99% confidence interval.

  3. Calculate the effect size using Cohen’s d.

Question 9.30

9.30

Using Cohen’s conventions, interpret the effect sizes that you calculated in:

  1. Exercise 9.28c

  2. Exercise 9.29c

Question 9.31

9.31

Identify critical t values for each of the following tests:

  1. A one-tailed, paired-samples t test performed on before-and-after scores on the Marital Satisfaction Inventory for 18 people who went through marriage counseling, using a p level of 0.01.

  2. A two-tailed, paired-samples t test performed on before-and-after scores on the Marital Satisfaction Inventory for 64 people who went through marriage counseling, using a p level of 0.05.

Question 9.32

9.32

Assume 8 participants completed a mood scale before and after watching a funny video clip.

  1. Identify the critical t value for a one-tailed, paired-samples t test with a p level of 0.01.

  2. Identify the critical t values for a two-tailed, paired-samples t test with a p level of 0.01.

Question 9.33

9.33

The following are scores for 8 students on two different exams.

Exam I Exam II
92 84
67 75
95 97
82 87
73 68
59 63
90 88
72 78
  1. Calculate the paired-samples t statistic for these exam scores.

  2. Using a two-tailed test and a p level of 0.05, identify the critical t values and make a decision regarding the null hypothesis.

  3. Assume you instead collected exam scores from 1000 students whose mean difference score and standard deviation were exactly the same as for these 8 students. Using a two-tailed test and a p level of 0.05, identify the critical t values and make a decision regarding the null hypothesis.

  4. How did changing the sample size affect the decision regarding the null hypothesis?

Question 9.34

9.34

The following are mood scores for 12 participants before and after watching a funny video clip (higher values indicate better mood).

Before After Before After
7 2 4 2
5 4 7 3
5 3 4 1
7 5 4 1
6 5 5 3
7 4 4 3
  1. Calculate the paired-samples t statistic for these mood scores.

  2. Using a one-tailed hypothesis test that the video clip improves mood, and a p level of 0.05, identify the critical t values and make a decision regarding the null hypothesis.

  3. Using a two-tailed hypothesis test with a p level of 0.05, identify the critical t values and make a decision regarding the null hypothesis.

Question 9.35

9.35

Consider the following data:

Score 1 Score 2 Score 1 Score 2
45 62 15 26
34 56 51 56
22 40 28 33
45 48
  1. Calculate the paired-samples t statistic, assuming a two-tailed test.

  2. Calculate the 95% confidence interval, assuming a two-tailed test.

  3. Calculate the effect size for the mean difference.

Question 9.36

9.36

Consider the following data.

Score 1 Score 2
23 16
30 12
28 25
30 27
14 6
  1. Calculate the paired-samples t statistic, assuming a two-tailed test.

  2. Calculate the 95% confidence interval.

  3. Calculate the effect size.

241

Question 9.37

9.37

Assume we know the following for a paired-samples t test: N = 13, Mdifference = −0.77, s = 1.42.

  1. Calculate the t statistic.

  2. Calculate a 95% confidence interval for a two-tailed test.

  3. Calculate the effect size using Cohen’s d.

Question 9.38

9.38

Assume we know the following for a paired-samples t test: N = 32, Mdifference = 1.75, s = 4.0.

  1. Calculate the t statistic.

  2. Calculate a 95% confidence interval for a two-tailed test.

  3. Calculate the effect size using Cohen’s d.

Applying the Concepts

Question 9.39

9.39

The relation between the z distribution and the t distributions: For the hypothesis tests described below in parts (a) through (c), one of which is the same as that described in Exercise 9.31a, identify what the critical z value would have been if there had been just one sample and we knew the mean and standard deviation of the population:

  1. A single-sample t test examining scores for 26 participants to see if there is any difference compared to the population, using a p level of 0.05

  2. A one-tailed, single-sample t test performed on scores on the Marital Satisfaction Inventory for 18 people who went through marriage counseling, using a p level of 0.01

  3. A two-tailed, single-sample t test, using a p level of 0.05, with 34 degrees of freedom

  4. Comparing the critical t value from 9.31a with the critical z value from 9.39b, explain how and why these are different.

Question 9.40

9.40

t statistics and standardized tests: On its Web site, the Princeton Review claims that students who have taken its course improve their Graduate Record Examination (GRE) scores, on average, by 210 points (based on the old scoring system). (No other information is provided about this statistic.) Treating this average gain as a population mean, a researcher wonders whether the far cheaper technique of practicing for the GRE on one’s own would lead to a different average gain. She randomly selects five students from the pool of students at her university who plan to take the GRE. The students take a practice test before and after 2 months of self-study. They reported (fictional) gains of 160, 240, 340, 70, and 250 points. (Note that many experts suggest that the results from self-study are similar to those from a structured course for students who have the self-discipline to study on their own. Regardless of the format, preparation has been convincingly demonstrated to lead to increased scores, on average.)

  1. Using symbolic notation and formulas (where appropriate), determine the appropriate mean and standard error for the distribution to which we will compare this sample. Show all steps of your calculations.

  2. Using symbolic notation and the formula, calculate the t statistic for this sample.

  3. As an interested consumer, what critical questions would you want to ask about the statistic reported by the Princeton Review? List at least three questions.

Question 9.41

9.41

Single-sample t test, military training, and anger: Bardwell, Ensign, and Mills (2005) assessed the moods of 60 male U.S. Marines following a month-long training exercise conducted in cold temperatures and at high altitudes. Negative moods, including fatigue and anger, increased substantially during the training and lasted up to 3 months after the training ended. Mean mood scores were compared to population norms for three groups: college men, adult men, and male psychiatric outpatients. Let’s examine anger scores for six Marines at the end of training; these scores are fictional, but their mean and standard deviation are very close to the actual descriptive statistics for the sample: 14, 12, 13, 12, 14, 15.

  1. The population mean anger score for college men is 8.90. Conduct all six steps of a single-sample t test. Report the statistics as you would in a journal article.

  2. Now calculate the test statistic to compare this sample mean to the population mean anger score for adult men (M = 9.20). You do not have to repeat all the steps from part (a), but conduct step 6 of hypothesis testing and report the statistics as you would in a journal article.

  3. Now calculate the test statistic to compare this sample mean to the population mean anger score for male psychiatric outpatients (M = 13.5). Do not repeat all the steps from part (a), but conduct step 6 of hypothesis testing and report the statistics as you would in a journal article.

  4. What can we conclude overall about Marines’ moods following high-altitude, cold-weather training?

Question 9.42

9.42

t tests and the cost of Levi’s jeans and H&M dresses in Halifax: Numbeo is a crowd-sourced Web site that gathers data on cities and countries around the world (http://www.numbeo.com/cost-of-living/). Visitors to the site can add information on variables such as cost of living, traffic, and crime. The data are searchable by city or country. For example, when we looked up Halifax, Canada, we discovered that a pair of jeans (“Levis 501 or similar”) goes for an average of $62.50 (Numbeo, 2015). The range was $50 to $75. And a summer dress from somewhere like Zara or H&M goes for an average of $48, with a range of $40 to $60. Numbeo also tells us that their Halifax data are based on contributions from 80 different people.

242

  1. Let’s say that Levi Strauss & Co. agreed to tell you the mean price (in Canadian dollars) of Levi’s 501 jeans around the world. If you wanted to test whether Levi’s 501 jeans in Halifax cost a different amount from the world price, what hypothesis test would you use? Explain your answer.

  2. What additional information would you need to conduct the hypothesis test that you identified in part (a)?

  3. Thinking back to Chapter 5, what kind of sample is this? What concerns might you have about this kind of a sample?

  4. Numbeo doesn’t tell us how many of the 80 contributors answered each of these questions. Why might the sample be even smaller than 80 for the average cost of summer dresses?

Question 9.43

9.43

Brain exercises and a paired-samples t test: PowerBrainRx, a Hong Kong-based for-profit company, promises to improve cognition. Their Web site lists testimonials, including one from a parent whose children “seemed to have better working memories, improved problem solving ability like mathematics, more logical thinking and better academic performance” following mental exercise training. There are numerous ads for companies like PowerBrainRx on the Internet and on late-night television, but there does not seem to be a lot of research examining the specific programs these companies are selling. How could you design a study for PowerBrainRx that would use a paired-samples t test to analyze the data?

Question 9.44

9.44

t tests and retail: Many communities worldwide are lamenting the effects of so-called big box retailers (e.g., Walmart) on their local economies, particularly on small, independently owned shops. Do these large stores affect the bottom lines of locally owned retailers? Imagine that you decide to test this premise. You assess earnings at 20 local stores for the month of October, a few months before a big box store opens. You then assess earnings the following October, correcting for inflation.

  1. What are the two populations?

  2. What is the comparison distribution? Explain.

  3. Which hypothesis test would you use? Explain.

  4. Check the assumptions for this hypothesis test.

  5. What is one flaw in drawing conclusions from this comparison over time?

  6. State the null and research hypotheses in both words and symbols.

Question 9.45

9.45

Paired-samples t tests, confidence intervals, and hockey goals: Below are the numbers of goals scored by the lead scorers of the New Jersey Devils hockey team in the 2007–2008 and 2008–2009 seasons. On average, did the Devils play any differently in 2008–2009 than they did in 2007–2008?

Player 2007–2008 2008–2009
Elias 20 31
Zajac 14 20
Pandolfo 12 5
Langenbrunner 13 29
Gionta 22 20
Parise 32 45
  1. Conduct the six steps of hypothesis testing using a two-tailed test and a p level of 0.05.

  2. Report the test statistic in APA format.

  3. Calculate the confidence interval for the paired-samples t test you conducted in part (a). Compare the confidence interval to the results of the hypothesis test.

  4. Calculate the effect size for the mean difference between the 2007–2008 and 2008–2009 seasons.

Question 9.46

9.46

Paired-samples t test and graduate admissions: Is it harder to get into graduate programs in psychology or in history? We randomly selected five institutions from among all U.S. institutions with graduate programs. The first number for each is the minimum grade point average (GPA) for applicants to the psychology doctoral program, and the second is for applicants to the history doctoral program. These GPAs were posted on the Web site of the well-known college guide company Peterson’s.

Wayne State University: 3.0, 2.75
University of Iowa: 3.0, 3.0
University of Nevada, Reno: 3.0, 2.75
George Washington University: 3.0, 3.0
University of Wyoming: 3.0, 3.0
  1. The participants are not people; explain why it is appropriate to use a paired-samples t test for this situation.

  2. Conduct all six steps of a paired-samples t test. Be sure to label all six steps.

  3. Calculate the effect size and explain what this adds to your analysis.

  4. Report the statistics as you would in a journal article.

Question 9.47

9.47

Attitudes toward statistics and the paired-samples t test: A professor wanted to know if her students’ attitudes toward statistics changed by the end of the course, so she asked them to fill out an “Attitudes Toward Statistics” scale at the beginning of the term and at the end of the term.

243

  1. What kind of t test should she use to analyze the data?

  2. If the average (mean) at the end of the class was higher than it was at the beginning, is that necessarily a statistically significant improvement?

  3. Which situation makes it easier to declare that a certain mean difference is statistically significant: a class with 7 students or a class with 700 students? Explain your answer.

Question 9.48

9.48

Paired-samples t tests, confidence intervals, and wedding-day weight loss: It seems that 14% of engaged women buy a wedding dress at least one size smaller than their current size. Why? Cornell researchers reported an alarming tendency for women who are engaged to sometimes attempt to lose an unhealthy amount of weight prior to their wedding (Neighbors & Sobal, 2008). The researchers found that engaged women weighed, on average, 152.1 pounds. The average ideal wedding weight reported by 227 women was 136.0 pounds. The data below represent the fictional weights of 8 women on the day they bought their wedding dress and on the day they got married. Did women lose weight for their wedding day?

Dress Purchase Wedding Day
163 158
144 139
151 150
120 118
136 132
158 152
155 150
145 146
  1. Conduct the six steps of hypothesis testing using a one-tailed test and a p level of 0.05.

  2. Report the test statistic in APA format.

  3. Calculate the confidence interval for the paired-samples t test that you conducted in part (a). Compare the confidence interval to the results of the hypothesis test.

Question 9.49

9.49

Paired-samples t test, decorations in kindergarten classrooms, and science learning: Psychology researcher Anna Fisher and her colleagues studied whether kindergarten students learned better in decorated classrooms or undecorated classrooms, referred to as “sparse classrooms” (Fisher et al., 2014). They wondered whether students would be less distracted and learn better without decorations such as posters, maps, and children’s artwork. The same group of children had science lessons in a classroom without decorations and in a classroom with decorations. The students took a test on the material after each condition. Each child received a percentage-correct score, out of 100%, for each condition. In the journal article in which they reported their findings, the researchers wrote that the children’s “learning scores were higher in the sparse-classroom condition (M = 55%) than in the decorated-classroom condition (M = 42%), paired-samples t(22) = 2.95, p = .007; this effect was of medium size, Cohen’s d = 0.65.”

  1. What is the independent variable in this study and what are its levels?

  2. What is the dependent variable in this study?

  3. Why did the researchers analyze their data with a paired-samples t test?

  4. How many children participated in this study? Explain how you determined your answer.

  5. How do you know this result is statistically significant?

  6. Using the means, explain the results to someone who has not taken statistics.

  7. Why did the researchers report an effect size?

Putting It All Together

Question 9.50

9.50

Paid days off and the single-sample t test: The number of paid days off (e.g., vacation, sick leave) taken by eight employees at a small local business is compared to the national average. You are hired as a consultant by the new business owner to help her determine how many paid days off she should provide. In general, she wants to set some standard for her employees and for herself. Let’s assume your search on the Internet for data on paid days off leaves you with the impression that the national average is 15 days. The data for the eight local employees during the last fiscal year are: 10, 11, 8, 14, 13, 12, 12, and 27 days.

  1. Write hypotheses for your research.

  2. Which type of test would be appropriate to analyze these data in order to answer your question?

  3. Before doing any computations, do you have any concerns about this research? Are there any questions you might like to ask about the data you have been given?

  4. Calculate the appropriate t statistic. Show all of your work in detail.

  5. Draw a statistical conclusion for this business owner.

  6. Calculate the confidence interval.

  7. Calculate and interpret the effect size.

  8. Consider all the results you have calculated. How would you summarize the situation for this business owner? Identify the limitations of your analyses, and discuss the difficulties of making comparisons between populations and samples. Make reference to the assumptions of the statistical test in your answer.

    244

  9. After further investigation, you discover that one of the data points, 27 days, was actually the owner’s number of paid days off. Calculate the t statistic and draw a statistical conclusion, adapting for this new information by deleting that value. What changed in the re-analysis of the data?

  10. Calculate and interpret the effect size, adapting for this new information by deleting the outlier of 27 days. What changed in the re-analysis of the data?

Question 9.51

9.51

Death row and the single-sample t test: The Florida Department of Corrections publishes an online death row fact sheet. It reports the average time on death row prior to execution as 11.72 years but provides no standard deviation. This mean is a parameter because it is calculated from the entire population of executed prisoners in Florida. Did the time spent on death row change over time? According to the execution list linked to the same Web site, the six prisoners executed in Florida during the years 2003, 2004, and 2005 spent 25.62, 13.09, 8.74, 17.63, 2.80, and 4.42 years on death row, respectively. (All were men, although Aileen Wuornos, the serial killer portrayed by Charlize Theron in the 2003 film Monster, was among the three prisoners executed by the state of Florida in 2002; Wuornos spent 10.69 years on death row.)

  1. Using symbolic notation and formulas (where appropriate), determine the appropriate mean and standard error for the distribution of means. Show all steps of your calculations.

  2. Using symbolic notation and the formula, calculate the t statistic for time spent on death row for the sample of executed prisoners.

  3. The execution list provides data on all prisoners executed since the death penalty was reinstated in Florida in 1976. Included for each prisoner are the name, race, gender, date of birth, date of offense, date sentenced, date arrived on death row, data of execution, number of warrants, and years on death row. State at least one hypothesis, other than year of execution, which could be examined using a t distribution and the comparison mean of 11.72 years on death row. Be specific about your hypothesis (and if you are interested, you can search for the data online).

  4. What additional information would you need to calculate a z score for the length of time Aileen Wuornos spent on death row?

  5. Write hypotheses to address the question “Did the time spent on death row change over time?”

  6. Using these data as “over time” and the mean of 11.72 years as the comparison, answer the question based on the t statistic calculated in part (b), using alpha of 0.05.

  7. Calculate the confidence interval for this statistic based on the data presented.

  8. What conclusion would you make about your hypotheses based on this confidence interval? What can you say about the size of this confidence interval?

  9. Calculate the effect size using Cohen’s d.

  10. Evaluate the size of this effect.

Question 9.52

9.52

Political bias in academia and a paired-samples t test: The following is an excerpt from the abstract (brief opening summary) from a published research study that examined a reported bias against conservatives in American academia (Fosse, Gross, & Ma, 2011).

The American professoriate contains a disproportionate number of people with liberal political views. Is this because of political bias or discrimination?…We sent two emails to directors of graduate study in the leading American departments of sociology, political science, economics, history, and English. The emails came from fictitious students who expressed interest in doing graduate work in the department…. We analyze responses received in terms of frequency, timing, amount of information provided about the department, emotional warmth, and enthusiasm toward the student. (p. 1)

One of the fictional emails was from a fictional student who mentioned working on the presidential campaign of John McCain, a well-known conservative, and one was from a fictional student who mentioned working on the presidential campaign of Barack Obama, a well-known liberal. The researchers conducted a series of paired-samples t tests but did not find statistically significant differences on the various measures between ratings of the conservative and liberal students.

  1. Why is this a within-groups design?

  2. What is the independent variable and what are its levels?

  3. What are the dependent variables, as listed in the study description, and what kind of variables are they?

  4. Explain why it would have been possible to conduct a paired-samples t test.

  5. Was the p value likely to be lower than or higher than 0.05? Explain your answer.

  6. Given that the results were not statistically significant, what additional information would you want to know to determine whether there was sufficient statistical power?

Question 9.53

9.53

Hypnosis and the Stroop effect: In Chapter 1, you were given an opportunity to complete the Stroop test, in which color words are printed in the wrong color; for example, the word red might be printed in the color blue. The conflict that arises when we try to name the color of ink the words are printed in but are distracted when the color word does not match the ink color increases reaction time and decreases accuracy. Several researchers have suggested that the Stroop effect can be decreased by hypnosis. Raz, Fan, and Posner (2005) used brain-imaging techniques to demonstrate that posthypnotic suggestion led highly hypnotizable people to see Stroop words as nonsense words. Imagine that you are working with Raz and colleagues and your assignment is to determine whether reaction times decrease (remember, a decrease is a good thing; it indicates that participants are faster) when highly hypnotizable people receive a posthypnotic suggestion to view the words as nonsensical. You conduct the experiment on six participants, once in each condition, and receive the following data; the first number is reaction time in seconds without the posthypnotic suggestion, and the second number is reaction time with the posthypnotic suggestion:

245

Participant 1: 12.6, 8.5
Participant 2: 13.8, 9.6
Participant 3: 11.6, 10.0
Participant 4: 12.2, 9.2
Participant 5: 12.1, 8.9
Participant 6: 13.0, 10.8
  1. What is the independent variable and what are its levels? What is the dependent variable?

  2. Conduct all six steps of a paired-samples t test as a two-tailed test. Be sure to label all six steps.

  3. Report the statistics as you would in a journal article.

  4. Now let’s look at the effect of switching to a one-tailed test. Conduct steps 2, 4, and 6 of hypothesis testing for a one-tailed paired-samples t test. Under which circumstance—a one-tailed or a two-tailed test—is it easier to reject the null hypothesis? If it becomes easier to reject the null hypothesis under one type of test (one-tailed versus two-tailed), does this mean that there is a bigger mean difference between the samples? Explain.

  5. Now let’s look at the effect of p level. Conduct steps 4 and 6 of hypothesis testing for a p level of 0.01 and a two-tailed test. With which p level—0.05 or 0.01—is it easiest to reject the null hypothesis with a two-tailed test? If it is easier to reject the null hypothesis with certain p levels, does this mean that there is a bigger mean difference between the samples? Explain.

  6. Now let’s look at the effect of sample size. Calculate the test statistic using only participants 1–3 and determine the new critical values. Is this test statistic closer to or farther from the cutoff? Does reducing the sample size make it easier or more difficult to reject the null hypothesis? Explain.