3.1 3.1 Sources of Data

164

When you complete this section, you will be able to:

  • Identify anecdotal data and, using specific examples, explain why they have limited value.

  • Identify available data and explain how they can be used in specific examples.

  • Identify data collected from sample surveys and explain how they can be used in specific examples.

  • Identify data collected from experiments and explain how they can be used in specific examples.

  • Distinguish data that are from experiments, from observational studies that are sample surveys, and from observational studies that are not sample surveys.

  • Identify the treatment in an experiment.

There are many sources of data. Some data are very easy to collect, but they may not be very useful. Other data require careful planning and need professional staff to gather. These can be much more useful. Whatever the source, a good statistical analysis will start with a careful study of the source of the data. Here is one type of source.

Anecdotal data

It is tempting to simply draw conclusions from our own experience, making no use of more broadly representative data. A magazine article about Pilates says that men need this form of exercise even more than women do. The article describes the benefits that two men received from taking Pilates classes. A newspaper ad states that a particular brand of window is “considered to be the best” and says that “now is the best time to replace your windows and doors.” These types of stories, or anecdotes, sometimes provide data. However, this type of data does not give us a sound basis for drawing conclusions.

ANECDOTAL DATA

Anecdotal data represent individual cases, which often come to our attention because they are striking in some way. These cases are not necessarily representative of any larger group of cases.

USE YOUR KNOWLEDGE

Question 3.1

3.1 The best instructor? A friend tells you that the instructor in her statistics class is the best teacher in the college. Can you conclude that this teacher is better than all of the other instructors in the college? Explain your answer.

Question 3.2

3.2 Describe an anecdote. Find an example from some recent experience where anecdotal evidence was used to draw a conclusion that is not justified. Describe the example and explain why the anecdote should not be used in this way.

165

Question 3.3

3.3 Opposition to a new requirement. Your student newspaper ran a story describing interviews with three students who were strongly opposed to a proposed new requirement that all students take a course on ethics. Can you conclude that most students are opposed to this requirement? Explain your answer.

Question 3.4

3.4 Are all vehicles this good? A friend has driven a Toyota Camry for more than 200,000 miles and with only the usual service maintenance expenses. Explain why not all Camry owners can expect this kind of performance.

Not all anecdotal data are bad. The experiences of an individual or a small group of individuals might suggest an interesting study that could be performed using more carefully collected data.

Available data

Occasionally, data are collected for a particular purpose but can also serve as the basis for drawing sound conclusions about other research questions. We use the term available data for this type of data.

AVAILABLE DATA

Available data are data that were produced for some other purpose but that may help answer a question of interest.

The library and the Internet can be good sources of available data. Because producing new data is expensive, we all use available data whenever possible. Here are two examples.

EXAMPLE 3.1

How Americans use their time. If you visit the U.S. Bureau of Labor Statistics website, bls.gov, you will find many interesting sets of data and statistical summaries. The American Time Use Survey1 recently reported that men spend an average of 5.71 hours per day on leisure and sports activities, while women spend an average of 4.93 hours on these activities.

EXAMPLE 3.2

Math skills. At the website of the National Center for Education Statistics, nces.ed.gov, you will find full details about the math skills of schoolchildren as determined by the latest National Assessment of Educational Progress (Figure 3.1). Mathematics scores have slowly but steadily increased since 1990. Across all racial/ethnic groups, both boys and girls in most states are getting better in math.

166

image
Figure 3.1: Figure 3.1 Websites of government statistical offices are prime sources of data. Here is a page from the website of the National Center for Education Statistics. (Source: U.S. Department of Education Institute of Education Sciences National Center for Education Statistics)

Many nations have a single national statistical office, such as Statistics Canada (statcan.gc.ca) and Mexico’s INEGI (www.inegi.org.mx). More than 70 different U.S. agencies collect data. You can reach most of them through the U.S. government’s FedStats site (fedstats.sites.usa.gov).

USE YOUR KNOWLEDGE

Question 3.5

3.5 What more do you need? A website claims that Millennial generation consumers are very loyal to the brands that they prefer. What additional information do you need to evaluate this claim?

A survey of college athletes is designed to estimate the percent who gamble. Do restaurant patrons give higher tips when their server repeats their order carefully? The validity of our conclusions from the analysis of data collected to address these issues rests on a foundation of carefully collected data.

167

In this chapter, we will develop the skills needed to produce trustworthy data and to judge the quality of data produced by others. The techniques for producing data that we will study require no formulas, but they are among the most important ideas in statistics. Statistical designs for producing data rely on either sampling or experiments.

Sample surveys and experiments

How have the attitudes of Americans, on issues ranging from abortion to work, changed over time? Sample surveyssample surveys are the usual tool for answering questions like these.

EXAMPLE 3.3

The General Social Survey. One of the most important sample surveys is the General Social Survey (GSS) conducted by the National Opinion Research Center (NORC), an organization affiliated with the University of Chicago.2 The GSS interviews about 3000 adult residents of the United States every other year.

The GSS selects a samplesample of adults to represent the larger population of all English-speaking adults livingpopulation in the United States. The idea of sampling is to study a part in order to gain information about the whole. Data are often produced by sampling a population of people or things. Opinion polls, for example, may report the views of the entire country based on interviews with a sample of about 1000 people. Government reports on employment and unemployment are produced from a monthly sample of about 60,000 households. The quality of manufactured items is monitored by inspecting small samples each hour or each shift.

USE YOUR KNOWLEDGE

Question 3.6

3.6 Check out the General Social Survey. Visit the General Social Survey website at gss.norc.org. Write a short summary of one of their reports, paying particular attention to the methods used to collect the data.

In all our examples, the expense of examining every item in the population makes sampling a practical necessity. Timeliness is another reason for preferring a sample to a censuscensus, which is an attempt to contact every individual in the population. We want information on current unemployment and public opinion next week, not next year. Moreover, a carefully conducted sample is often more accurate than a census. Accountants, for example, sample a firm’s inventory to verify the accuracy of the records. Attempting to count every last item in the warehouse would be not only expensive, but also inaccurate. Bored people do not count carefully.

If conclusions based on a sample are to be valid for the population, a sound design for selecting the sample is required. Sampling designs are the topic of Section 3.3.

A sample survey collects information about a population by selecting and measuring a sample from the population. The goal is a picture of the population, disturbed as little as possible by the act of gathering information. Sample surveys are one kind of observational study.

168

OBSERVATION VERSUS EXPERIMENT

In an observational study, we observe individuals and measure variables of interest but do not attempt to influence the responses.

In an experiment, we deliberately impose some condition on individuals and we observe their responses.

EXAMPLE 3.4

Baseball players have strong bones in their throwing arms. A study of young baseball players measured the strength of the bones in their throwing arms. A control group of subjects who were matched with the baseball players based on age were also measured. This is an example of an observational study that is not a sample survey. The study reported that bone strength was 30% higher in the baseball players.3

What can we conclude from this study? If you start to play baseball, will you have stronger bones in your throwing arm?

EXAMPLE 3.5

Is there a cause-and-effect relationship? Example 3.4 describes an observational study. People choose to participate in baseball or not. Is it possible that those who choose to play baseball have stronger arms than those who do not? The study does not address this question.

We can imagine an experiment that would remove these difficulties. From a large group of subjects, require some to play baseball and forbid the rest from playing. This is an experiment because the condition (playing baseball or not) is imposed on the subjects. Of course, this particular experiment is neither practical nor ethical.

EXAMPLE 3.6

Baseball and bones. Example 3.4 compared the arm bone strengths of baseball players with those of age-matched controls. Although the study tells us something about baseball players, the results are particularly interesting because they suggest that certain kinds of exercise can help us to build strong bones.

USE YOUR KNOWLEDGE

Question 3.7

3.7 Available data. Can available data be from an observational study? Can available data be from an experiment? Explain your answers.

Question 3.8

3.8 Picky eaters. A study of 2049 children in grades 4 to 6 in 33 schools recorded their behaviors in the lunchroom. One of the conclusions of the study was that girls discarded more food than boys.4 Is this an observational study or an experiment? Is it a sample survey? If it is an experiment, what is the treatment? Explain your answers.

169

Question 3.9

3.9 Automatic soap dispensers. A study compared several brands of automatic soap dispensers. For one test, the dispensers were run until their AA batteries failed. The times to failure were compared for the different brands.5 Is this an observational study or an experiment? Is it a sample survey? If it is an experiment, what is the condition? Explain your answers.

An observational study, even one based on a carefully chosen sample, is a poor way to determine what will happen if we change something. The best way to see the effects of a change is to do an interventionintervention—where we actually impose the change. When our goal is to understand cause and effect, experiments are the only source of fully convincing data.

Confounding occurs when an explanatory variable is related to one or more other variables that have an influence on the response variable. When this happens, we sometimes attribute a relationship to an explanatory when the effect is fully or partly due to the confounding variables.

confounding, p. 150

explanatory variable, p. 82

In Example 3.4, the effect of baseball playing on arm bone strength is confounded with (mixed up with) other characteristics of the subjects in the study. Observational studies that examine the effect of a single variable on an outcome can be misleading when the effects of the explanatory variable are confounded with those of other variables.

Because experiments allow us to isolate the effects of specific variables, we generally prefer them. Here is an example.

EXAMPLE 3.7

Which web page design sells more? A company that sells products on the Internet wants to decide which of two possible web page designs to use. During a two-week period, they will use both designs and collect data on sales. They randomly select one of the designs to be used on the first day and then alternate the two designs on each of the following days. At the end of this period, they compare the sales for the two designs.

Experiments usually require some sort of randomization, as in this example. We begin the discussion of statistical designs for data collection in Section 3.2 with the principles underlying the design of experiments.

USE YOUR KNOWLEDGE

Question 3.10

3.10 Software for teaching creative writing. An educational software company wants to compare the effectiveness of its computer animation for teaching creative writing with that of a textbook presentation. The company tests the creative-writing skills of a number of second-year college students and then randomly divides them into two groups. One group uses the animation, and the other studies the text. The company retests all the students and compares the increase in creative-writing skills in the two groups. Is this an experiment? Why or why not? What are the explanatory and response variables?

response variable, p. 82

Question 3.11

3.11 Apples or apple juice? Food rheologists study different forms of foods and how the form of a food affects how full we feel when we eat it. One study prepared samples of apple juice and samples of apples with the same number of calories. Half of the subjects were fed apples on one day followed by apple juice on a later day; the other half received the apple juice followed by the apples. After eating, the subjects were asked about how full they felt. Is this an experiment? Why or why not? What are the explanatory and response variables?