Exercises

Clarifying the Concepts

Question 4.1

4.1

Define the three measures of central tendency: mean, median, and mode.

Question 4.2

4.2

The mean can be assessed visually and arithmetically. Describe each method.

Question 4.3

4.3

Explain how the mean mathematically balances the distribution.

Question 4.4

4.4

Explain what is meant by unimodal, bimodal, and multimodal distributions.

Question 4.5

4.5

Explain why the mean might not be useful for a bimodal or multimodal distribution.

Question 4.6

4.6

What is an outlier?

Question 4.7

4.7

How do outliers affect the mean and median?

Question 4.8

4.8

In which situations is the mode typically used?

Question 4.9

4.9

Explain the concept of standard deviation in your own words.

Question 4.10

4.10

Define the symbols used in the equation for variance:

image

Question 4.11

4.11

Why is the standard deviation typically reported, rather than the variance?

Question 4.12

4.12

Find the incorrectly used symbol or symbols in each of the following statements or formulas. For each statement or formula, (1) state which symbol(s) is/are used incorrectly, (2) explain why the symbol(s) in the original statement is/are incorrect, and (3) state which symbol(s) should be used.

  1. The mean and standard deviation of the sample of reaction times were calculated (m = 54.2, SD2 = 9.87).

  2. The mean of the sample of high school student GPAs was μ = 3.08.

  3. range = Xhighest – Xlowest

Calculating the Statistics

Question 4.13

4.13

Use the following data for this exercise:

15 34 32 46 22 36 34 28 52 28

  1. Calculate the mean, the median, and the mode.

  2. Add another data point, 112. Calculate the mean, median, and mode again. How does this new data point affect the calculations?

  3. Calculate the range, variance, and standard deviation for the original data.

Question 4.14

4.14

Use the following salary data for this exercise:

$44,751

$52,000

$41,500

$38,862

$51,380

$61,774

  1. Calculate the mean, the median, and the mode.

  2. Add another salary, $97,582. Calculate the mean, median, and mode again. How does this new salary affect the calculations?

  3. Calculate the range, variance, and standard deviation for the original salary data.

  4. How does the range change when you include the outlier salary, $97,582?

Question 4.15

4.15

The Mount Washington Observatory (MWO) in New Hampshire claims to have the world’s worst weather. On the next page are some data on the weather extremes recorded at the MWO.

95

Month Normal Daily Maximum (°F) Normal Daily Minimum (°F) Record Low in °F (year) Peak Wind Gust in Miles per Hour (year)
January 14.0 –3.7 –47 (1934) 173 (1985)
February 14.8 –1.7 –46 (1943) 166 (1972)
March 21.3 5.9 –38 (1950) 180 (1942)
April 29.4 16.4 –20 (1995) 231 (1934)
May 41.6 29.5 –2 (1966) 164 (1945)
June 50.3 38.5 8 (1945) 136 (1949)
July 54.1 43.3 24 (2001) 154 (1996)
August 53.0 42.1 20 (1986) 142 (1954)
September 46.1 34.6 9 (1992) 174 (1979)
October 36.4 24.0 –5 (1939) 161 (1943)
November 27.6 13.6 –20 (1958) 163 (1983)
December 18.5 1.7 –46 (1933) 178 (1980)
  1. Calculate the mean and median normal daily minimum temperature across the year.

  2. Calculate the mean, median, and mode for the record low temperatures.

  3. Calculate the mean, median, and mode for the peak wind-gust data.

  4. When no mode appears in the raw data, we can compute a mode by breaking the data into intervals. How might you do this for the peak wind-gust data?

  5. Calculate the range, variance, and standard deviation for the normal daily minimum temperature across the year.

  6. Calculate the range, variance, and standard deviation for the record low temperatures.

  7. Calculate the range, variance, and standard deviation for the peak wind-gust data.

Question 4.16

4.16

Here are recent U.S. News & World Report data on acceptance rates at the top 70 national universities. These are the percentages of accepted students out of all students who applied.

6.3 14.0 8.9 21.6 40.6 51.2 50.5 69.4 42.4 68.3
8.5 12.4 18.0 30.4 31.4 51.3 47.5 49.4 54.6 63.5
7.7 12.8 18.8 25.5 28.0 33.4 38.3 25.0 49.4 56.7
7.0 10.1 24.3 23.0 32.7 46.0 52.4 31.6 44.7 62.8
16.3 18.0 16.4 33.3 40.0 35.5 67.6 43.2 57.9 63.3
9.7 18.4 26.7 39.9 34.6 39.6 46.6 34.5 47.3 61.1
7.1 16.5 18.1 21.9 34.1 45.7 58.4 63.4 63.0 46.6
  1. Calculate the mean of these data, showing that you know how to use the symbols and formula.

  2. Determine the median of these data.

  3. Describe the variability in these data by computing the range.

Applying the Concepts

Question 4.17

4.17

Mean versus median for salary data: In Exercises 4.13 and 4.14, we saw how the mean and median changed when an outlier was included in the computations. If you were reporting the “average” salary at a company, how might the mean and median give different impressions to potential applicants?

Question 4.18

4.18

Mean versus median for temperature data: For the data in Exercise 4.15, the “normal” daily maximum and minimum temperatures recorded at the Mount Washington Observatory are presented for each month. These are likely to be measures of central tendency for each month over time. Explain why these “normal” temperatures might be calculated as means or medians. What would be the reasoning for using one type of statistic over the other?

Question 4.19

4.19

Mean versus median for depression scores: A depression research unit recently assessed seven participants chosen at random from the university population. Is the mean or the median a better indicator of the central tendency of these seven participants? Explain your answer.

Question 4.20

4.20

Measures of central tendency for weather data: The “normal” weather data from the Mount Washington Observatory are broken down by month. Why might you not want to average across all months in a year? How else could you summarize the year?

Question 4.21

4.21

Outliers, central tendency, and data on wind gusts: There appears to be an outlier in the data for peak wind gust recorded on top of Mount Washington (see the data in Exercise 4.15). Where do you see an outlier and how does excluding this data point affect the different calculations of central tendency?

Question 4.22

4.22

Measures of central tendency for measures of baseball performance: Here are winning percentages for 11 baseball players for their best 4-year pitching performances:

0.755 0.721 0.708 0.773 0.782 0.747

0.477 0.817 0.617 0.650 0.651

  1. What is the mean of these scores?

  2. What is the median of these scores?

  3. Compare the mean and the median. Does the difference between them suggest that the data are skewed very much?

96

Question 4.23

4.23

Mean versus median in “real life”: Briefly describe a real-life situation in which the median is preferable to the mean. Give hypothetical numbers for the mean and median in your explanation. Be original! (Don’t use home prices or another example from the chapter.)

Question 4.24

4.24

Descriptive statistics in the media: Find an advertisement for an anti-aging product either online or in the print media—the more unbelievable the claims, the better!

  1. What does the ad promise that this product will do for the consumer?

  2. What data does it offer for its promised benefits? Does it offer any descriptive statistics or merely testimonials? If it offers descriptive statistics, what are the limitations of what they report?

  3. If you were considering this product, what measures of central tendency would you most like to see? Explain your answer, noting why not all measures of central tendency would be helpful.

  4. If a friend with no statistical background were considering this product, what would you tell him or her?

Question 4.25

4.25

Descriptive statistics in the media: When there is an ad on TV for a body-shaping product (e.g., an abdominal muscle machine), often a person with a wonderful success story is featured in the ad. The statement “Individual results may vary” hints at what kind of data the advertisement may be presenting.

  1. What kind of data is being presented in these ads?

  2. What statistics could be presented to help inform the public about how much “individual results might vary”?

Question 4.26

4.26

Range of data for Canadian TV ratings: Numeris (formerly BBM Canada) collects Canadian television ratings data (http://en.numeris.ca/). The following are the average number of viewers per minute (in thousands) for the top 30 English-language shows for 1 week. The NHL playoffs are listed at 1198, which indicates that an average of 1,198,000 viewers watched per minute. Big Bang Theory is in the number 1 position, with 2 Broke Girls at number 30. What is the range of these data?

3117 2935 2216 2128 1785 1735 1616 1602 1548 1519
1513 1476 1462 1263 1201 1198 1193 1189 1186 1155
1117 1102 1079 1057 1036 1034 1008 925 902 887

Question 4.27

4.27

Descriptive statistics for data from the National Survey of Student Engagement: Every year, the National Survey of Student Engagement (NSSE) asks university students how many 20-page papers they had been assigned. Here are the percentages, for 1 year, of students who said they had been assigned between 5 and 10 twenty-page papers for a sample of 19 national universities.

0 5 3 3 1 10 2
2 3 1 2 4 2 1
1 1 4 3 5
  1. Calculate the mean of these data using the symbols and formula.

  2. Calculate the variance of these data using the symbols and formula; also use columns to show all calculations.

  3. Calculate the standard deviation using the symbols and formula.

  4. In your own words, describe what the mean and standard deviation of these data tell us about these scores.

Question 4.28

4.28

Statistics versus parameters: For each of the following situations, state whether the mean would be a statistic or a parameter. Explain your answer.

  1. According to Canadian census data, the median family income in British Columbia was $66,970, lower than the national average of $69,860.

  2. The stadiums of teams in the English Premier League had a mean capacity of 38,391 fans.

  3. The General Social Survey (GSS) includes a vocabulary test in which participants are asked to choose the appropriate synonym from a multiple-choice list of five words (e.g., beast with the choices afraid, words, large, animal, and separate). The mean vocabulary test score was 5.98.

  4. The National Survey of Student Engagement (NSSE) asks students at participating institutions how often they discuss ideas or readings with professors outside of class. Among the 19 national universities that made their data public, the mean percentage of students who responded “Very often” was 8%.

Question 4.29

4.29

Central tendency and the shapes of distributions: Consider the many possible distributions of grades on a quiz in a statistics class; imagine that the grades could range from 0 to 100. For each of the following situations, give a hypothetical mean and median (that is, make up a mean and a median that might occur with a distribution that has this shape). Explain your answer.

  1. Normal distribution

  2. Positively skewed distribution

  3. Negatively skewed distribution

Question 4.30

4.30

Shapes of distributions: For each of the following, state whether the distribution is more likely to be unimodal or bimodal. Explain your answer.

  1. Age of patients in a hospital maternity ward

  2. University students’ depression scores on a Beck Depression Inventory

    97

  3. GRE scores of applicants to sociology graduate programs

  4. The cost of an AIDS drug that is sold in developed countries in Europe as well as in developing countries in Africa

Question 4.31

4.31

Outliers, Hurricane Sandy, and a rat infestation: In a New York Times article, reporter Cara Buckley described the influx of rats inland from the New York City shoreline following the flooding caused by Hurricane Sandy (2013). Buckley interviewed pest-control expert Timothy Wong, who noted that rat infestations could lead to citations for buildings that do not address the problem; yet, she reported, violations had decreased across the city in the wake of the hurricane—just 1996 violations versus 2750 for the same time period a year before. Why? Buckley explained: “After Hurricane Sandy, as of Nov. 1, the Health Department said it stopped issuing violations for rodents in Zone A,” the parts of New York City most vulnerable to flooding.

  1. If you were to create a monthly average of rat violations over the course of the year before and after Hurricane Sandy, why would you not be able to make comparisons?

  2. Explain how the removal of Zone A violations led both to the removal of an outlier and to inaccurate data.

Question 4.32

4.32

Outliers, H&M, and designer collaborations: The relatively low-cost Swedish fashion retailer H&M occasionally partners with high-end designers. For example, they collaborated with the designer Martin Margiela, and his line quickly sold out. If H&M were to report the average number of sales per item of clothing, why would the designer partnerships, like that with Martin Margiela, inflate the mean number of sales but not the median?

Question 4.33

4.33

Central tendency and outliers from growth-chart data: When the average height or average weight of children is plotted to create growth charts, do you think it would be appropriate to use the mean for these data? There are often outliers for height, but why might we not have to be concerned with their effect on these data?

Question 4.34

4.34

Measures of central tendency for percentages of advanced degrees: The U.S. Census Bureau collects and analyzes data on numerous aspects of American life by state, including the percentage of people with high school degrees, bachelor’s degrees, and advanced degrees. If you wanted to calculate the “average” percentage of people with advanced degrees across all states, would you report a mean, a median, or a mode? Explain your answer clearly.

Question 4.35

4.35

Mean versus median for age at first marriage: The mean age at first marriage was 31.1 for men and 29.1 for women in Canada in 2008 (http://well-being.esdc.gc.ca/misme-iowb/.3ndic.1t.4r@-eng.jsp?iid=78&_ga=1.54357281.804383131.1461715758). The median age at first marriage was 28.9 for men and 26.9 for women in the United States in 2011 (http://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?pid=ACS_11_1YR_B12007&prodType=table). Beyond the fact that these data are from slightly different years, explain why we cannot directly compare these measures of central tendency to make cross-national comparisons.

Question 4.36

4.36

Median ages and technology companies: In an article titled “Technology Workers Are Young (Really Young),” the New York Times reported median ages for a number of companies (Hardy, 2013). The reporter wrote: “The seven companies with the youngest workers, ranked from youngest to highest in median age, were Epic Games (26); Facebook (28); Zynga (28); Google (29); and AOL, Blizzard Entertainment, InfoSys, and Monster.com (all 30). According to the Bureau of Labor Statistics, only shoe stores and restaurants have workers with a median age less than 30.”

  1. Explain why the reporter provided medians rather than means for employee ages.

  2. Why might it be easier to use medians rather than means to compare ages across companies?

Question 4.37

4.37

Standard deviation and a texting intervention for parents of preschoolers: Researchers investigated READY4K, a program in which parents received text messages over an 8-month period (York & Loeb, 2014). The goal of the text messages was to help parents prepare their preschool-aged children for reading. The children of parents who received the text messages were compared to a second group of children whose parents did not receive text messages. The researchers reported that the text messages led to “student learning gains in some areas of early literacy, ranging from approximately 0.21 to 0.34 standard deviations.”

  1. Based on your knowledge of mean and standard deviation, explain what this finding means.

  2. Did the researchers use a between-groups design or a within-groups design? Explain your answer.

Question 4.38

4.38

Range, world records, and a long chain of friendship bracelets: Guinness World Records reported that, as part of an anti-bullying campaign, elementary school students in Pennsylvania created a chain of friendship bracelets that was a world-record 2678 feet long (http://www.guinnessworldrecords.com/news/2013/5/fan-choice-record-may-17-48702/). Guinness relies on what kind of data for amazing claims like this one? How does this relate to the calculation of ranges?

98

Putting It All Together

Question 4.39

4.39

Descriptive statistics and basketball wins: Here are the numbers of wins for the 30 National Basketball Association teams in the 2012–2013 season.

60 44 39 29 23 57 50 43 37 27
49 42 37 29 19 56 51 40 33 26
48 42 31 25 18 53 44 40 29 23
  1. Create a grouped frequency table for these data.

  2. Create a histogram based on the grouped frequency table.

  3. Determine the mean, median, and mode of these data. Use symbols and the formula when showing your calculation of the mean.

  4. Using software, calculate the range and standard deviation of these data.

  5. Write a one- to two-paragraph summary describing the distribution of these data. Mention center, variability, and shape. Be sure to discuss the number of modes (i.e., unimodal, bimodal, multimodal), any possible outliers, and the presence and direction of any skew.

  6. State one research question that might arise from this data set.

Question 4.40

4.40

Central tendency and outliers for data on traffic deaths: Below are estimated numbers of annual road traffic deaths for 12 countries based on data from the World Health Organization (http://apps.who.int/gho/data/view.main.51310):

Country Number of Deaths
United States 35, 490
Australia 1363
Canada 2296
Denmark 258
Finland 272
Germany 3830
Italy 4371
Japan 6625
Malaysia 7085
Portugal 1257
Spain 2478
Turkey 8758
  1. Compute the mean and the median across these 12 data points.

  2. Compute the range for these 12 data points.

  3. Recalculate the statistics in part (a) and part (b) without the data point for the United States. How are these statistics affected by including or excluding the United States?

  4. How might these numbers be affected by using traffic deaths per 100,000 people instead of using the number of traffic deaths overall?

  5. Do you think that traffic deaths might vary by other personal or national characteristics? Could these represent confounds (as discussed in Chapter 1)?