3.1 Sources of Data

124

There are many sources of data. Some data are very easy to collect, but they may not be very useful. Other data require careful planning and need professional staff to gather. These can be much more useful. Whatever the source, a good statistical analysis will start with a careful study of the source of the data. Here is one type of source.

Anecdotal data

It is tempting to simply draw conclusions from our own experience, making no use of more broadly representative data. An advertisement for a Pilates class says that men need this form of exercise even more than women. The ad describes the benefits that two men received from taking Pilates classes. A newspaper ad states that a particular brand of windows is “considered to be the best” and says that “now is the best time to replace your windows and doors.” These types of stories, or anecdotes, sometimes provide quantitative data. However, this type of data does not give us a sound basis for drawing conclusions.

Anecdotal Evidence

Anecdotal evidence is based on haphazardly selected cases, which often come to our attention because they are striking in some way. These cases need not be representative of any larger group of cases.

Apply Your Knowledge

Question 3.1

3.1 Is this good market research?

You and your friends are big fans of True Detective, an HBO police drama. To what extent do you think you can generalize your preference for this show to all students at your college?

3.1

This is anecdotal evidence; the preference of a group of friends likely would not generalize to the entire college.

Question 3.2

3.2 Should you invest in stocks?

You have just accepted a new job and are offered several options for your retirement account. One of these invests about 75% of your employer’s contribution in stocks. You talk to a friend who joined the company several years ago who said that after he chose that option, the value of the stocks decreased substantially. He strongly recommended that you choose a different option. Comment on the value of your friend’s advice.

Question 3.3

3.3 Preference for a brand.

Samantha is a serious runner. She and all her friends prefer drinking Gatorade Endurance to Heed prior to their long runs. Explain why Samantha’s experience is not good evidence that most young people prefer Gatorade Endurance to Heed.

3.3

This is anecdotal evidence; the preference of Samantha likely would not generalize to most young people.

Question 3.4

3.4 Reliability of a product.

A friend has driven a Toyota Camry for more than 200,000 miles with only the usual service maintenance expenses. Explain why not all Camry owners can expect this kind of performance.

Available data

Occasionally, data are collected for a particular purpose but can also serve as the basis for drawing sound conclusions about other research questions. We use the term available data for this type of data.

available data

Available Data

Available data are data that were produced in the past for some other purpose but that may help answer a present question.

125

The library and the Internet can be good sources of available data. Because producing new data is expensive, we all use available data whenever possible. Here are two examples.

EXAMPLE 3.1 International Manufacturing Productivity

If you visit the U.S. Bureau of Labor Statistics website, bls.gov, you can find many interesting sets of data and statistical summaries. One recent study compared the average hourly manufacturing compensation costs of 34 countries. The study showed that Norway and Switzerland had the top two costs.1

EXAMPLE 3.2 Can Our Workforce Compete in a Global Economy?

In preparation to compete in the global economy, students need to improve their mathematics.2 At the website of the National Center for Education Statistics, nces.ed.gov/nationsreportcard, you will find full details about the math skills of schoolchildren in the latest National Assessment of Educational Progress. Figure 3.1 shows one of the pages that reports on the increases in mathematics and reading scores.3

image
Figure 3.1: FIGURE 3.1 The websites of government statistical offices are prime sources of data. Here is a page from the National Assessment of Educational Progress, Example 3.2.

Many nations have a single national statistical office, such as Statistics Canada (statcan.gc.ca) and Mexico’s INEGI (inegi.org.mx/default.aspx). More than 70 different U.S. agencies collect data. You can reach most of them through the government’s FedStats site (fedstats.gov).

126

Apply Your Knowledge

Question 3.5

3.5 Check out the Bureau of Labor Statistics website.

Visit the Bureau of Labor Statistics website, bls.gov. Find a set of data that interests you. Explain how the data were collected and what questions the study was designed to answer.

Although available data can be very useful for many situations, we often find that clear answers to important questions require that data be produced to answer those specific questions. Are your customers likely to buy a product from a competitor if you raise your price? Is the expected return from a proposed advertising campaign sufficient to justify the cost? The validity of our conclusions from the analysis of data collected to address these issues rests on a foundation of carefully collected data. In this chapter, we learn how to produce trustworthy data and to judge the quality of data produced by others. The techniques for producing data that we study require no formulas, but they are among the most important ideas in statistics. Statistical designs for producing data rely on either sampling or experiments.

Sample surveys and experiments

How have the attitudes of Americans, on issues ranging from shopping online to satisfaction with work, changed over time? Sample surveys are the usual tool for answering questions like these. A sample survey collects data from a sample of cases that represent some larger population of cases.

sample survey

EXAMPLE 3.3 Confidence in Banks and Companies

One of the most important sample surveys is the General Social Survey (GSS) conducted by the NORC, a national organization for research and computing affiliated with the University of Chicago.4 The GSS interviews about 3000 adult residents of the United States every second year. The survey includes questions about how much confidence people have in banks and companies.

The GSS selects a sample of adults to represent the larger population of all English-speaking adults living in the United States. The idea of sampling is to study a part in order to gain information about the whole. Data are often produced by sampling a population of people or things. Opinion polls, for example, report the views of the entire country based on interviews with a sample of about 1000 people. Government reports on employment and unemployment are produced from a monthly sample of about 60,000 households. The quality of manufactured items is monitored by inspecting small samples each hour or each shift.

sample

population

Apply Your Knowledge

Question 3.6

3.6 Are Millennials loyal customers?

image

A website claims that Millennial generation consumers are very loyal to the brands that they prefer. What additional information do you need to evaluate this claim?

In all our examples, the expense of examining every item in the population makes sampling a practical necessity. Timeliness is another reason for preferring a sample to a census, which is an attempt to contact every case in the entire population. We want information on current unemployment and public opinion next week, not next year. Moreover, a carefully conducted sample is often more accurate than a census. Accountants, for example, sample a firm’s inventory to verify the accuracy of the records. Counting every item in a warehouse can be expensive and also inaccurate. Bored people might not count carefully.

census

127

If conclusions based on a sample are to be valid for the entire population, a sound design for selecting the sample is required. Sampling designs are the topic of Section 3.2.

A sample survey collects information about a population by selecting and measuring a sample from the population. The goal is a picture of the population, disturbed as little as possible by the act of gathering information. Sample surveys are one kind of observational study.

Observation versus Experiment

In an observational study, we observe cases and measure variables of interest but do not attempt to influence the responses.

In an experiment, we deliberately impose some treatment on cases and observe their responses.

Apply Your Knowledge

Question 3.7

3.7 Market share for energy drinks.

A website reports that Red Bull is the top energy drink brand with sales of $2.9 billion in 2014.5 Do you think that this report is based on an observational study or an experiment? Explain your answer.

3.7

Observational study because they are just observing which brand has the most sales.

Question 3.8

3.8 An advertising agency chooses an ESPN television ad.

An advertising agency developed two versions of an ad that will be shown during a major sporting event on EPSN but must choose only one to air. The agency recruited 100 college students and divided them into two groups of 50. Each group viewed one of the versions of the ad and then answered a collection of questions about their reactions to the ad. Is the advertising agency using an observational study or an experiment to help make its decision? Give reasons for your answer.

An observational study, even one based on a statistical sample, is a poor way to determine what will happen if we change something. The best way to see the effects of a change is to do an intervention—where we actually impose the change. The change imposed is called a treatment. When our goal is to understand cause and effect, experiments are the only source of fully convincing data. In an experiment, a treatment is imposed and the responses are recorded. Experiments usually require some sort of randomization.

intervention

treatment

experiment

We begin the discussion of statistical designs for data collection in Section 3.2 with the principles underlying the design of samples. We then move to the design of experiments in Section 3.3.