1.2 Collecting Data

In Section 1.1 we looked at data collected in various ways from populations and samples. For a researcher interested in a particular question, where do data come from?

The easiest way to obtain data is to use someone else’s. There is a wealth of existing data collected by government agencies, private organizations, and individuals, and much of it is readily available to the public. The United States federal government is arguably the nation’s largest collector and distributor of data; both raw and summarized data are available on the web and in printed form.

1.2.1 Sample Data vs. Population Data

The accompanying table shows summarized data that appears in the report America’s Children: Key National Indicators of Well-Being, 2012 from the Federal Interagency Forum on Child and Family Statistics. The table gives the percentage of children ages 3 to 5 who were read to every day in the week prior to the survey, categorized by the family’s poverty status (percent of the federal poverty threshold). This table would allow a researcher to investigate questions such as whether a family’s poverty status is related to whether children are read to every day, or what the percentage of children whose families are below 100% poverty who are read to every day is likely to be in 2015.

Poverty Status 1993 1995 1996 1999 2001 2005 2007
Below 100% poverty 43.6 46.6 46.8 38.7 48.3 50.0 39.7
100-199% poverty 49.1 55.7 52.0 51.4 51.8 59.5 49.6
200% poverty and above 60.9 65.2 65.5 61.8 64.1 65.0 63.9
Table 1.2: Percentage of children ages 3 to 5 who were read to every day in the week prior to the survey

Source: Federal Interagency Forum on Child and Family Statistics

These data come from the National Household Education Survey, which uses telephone interviews with a very large sample of households to obtain detailed information about education issues. Because these are numerical summaries of sample data, they are statistics.

The sinking of the Titanic has been a subject of great interest to moviegoers, treasure hunters, and social scientists. The stories of the survivors have intrigued many people, prompting questions about who survived and why. The list below shows a portion of a data set giving information about the passengers on the Titanic.

Class Survival Name Sex Age
1 1 Burns, Miss. Elizabeth Margaret female 41
3 0 Burns, Miss. Mary Delia female 18
2 1 Buss, Miss. Kate female 36
2 0 Butler, Mr. Reginald Fenton male 25
1 0 Butt, Major. Archibald Willingham male 45
2 0 Byles, Rev. Thomas Roussel Davids male 42
2 1 Bystrom, Mrs. (Karolina) female 42
3 0 Cacic, Miss. Manda female 21
3 0 Cacic, Miss Marija female 30
3 0 Cacic, Mr. Jego Grga male 18
3 0 Cacic, Mr. Luka male 38
1 0 Cairns, Mr. Alexander male
1 1 Calderhead, Mr. Edward Pennington male 42
2 1 Caldwell, Master. Alden Gates male 0.8333
Table 1.3: Passengers on the Titanic

The variables shown here are passenger class (1 = first, 2 = second, 3 = third), survival (0 = no, 1 = yes), name, sex, and age in years. The data set from which this table comes consists of 14 different variables measured on all 1313 passengers on the Titanic, including how much each person’s ticket cost, and whether he or she traveled with a parent, a spouse, or children. Researchers have used this data set to investigate many theories about the passengers, among them whether there was a relationship between passenger class and survival or between sex and survival.

We call a data set like this a census, because it lists all the individuals in the population, and records the value of one or more variables on each individual. The U.S. government is required by the constitution to conduct a census of the country every ten years. The Census Bureau sends survey questionnaires to each household, and then follows up on those not returned. This is a monumental task—in one week of the 2010 census, the Census Bureau supplemented its workforce with 585,729 temporary workers. The population data that results from the census is used to apportion the House of Representatives, to allocate federal funds, and for many other purposes.

Despite the care that the Census Bureau takes, certain groups tend to be undercounted or overcounted. The undercounts and overcounts that occurred in the 2010 census are discussed in a May 2012 press release. Even for populations smaller than that of the United States (over 300 million and growing), conducting a census is a difficult job. Unless the population of interest is quite small, most researchers who collect their own data do so using samples.

1.2.2 Experiments and Observational Studies

Let’s return to two of the examples we examined previously, and see how they are different. Consider first the study examining the effect of caffeine on birth outcomes. In this study, the researchers randomly assigned pregnant women to either the regular group or the decaf group, and then provided them with caffeinated or decaffeinated coffee during the pregnancy. Once the babies were born, their weights and lengths of gestations were recorded.

This study was an experiment. The researchers did something to the individuals involved in the study; the individuals were treated with either regular or decaffeinated coffee. Once the treatment concluded, the variables being considered were measured, and the researchers looked for evidence for or against their theory. In this case, the theory was that caffeine intake would cause lower birth weight or shorter gestation, and the experiment did not support such findings.

Now consider the Gallup Poll investigating the relationship between marital status and party affiliation. The researchers in this study did nothing to the individuals surveyed; they merely collected information about marital status and party affiliation. We call a study in which no treatment is imposed on the individuals an observational study; the researchers simply observe and record certain variables of interest about the individuals in the study.

Nearly everyone considers taking a test an anxiety-filled situation, so it is little wonder that college students report experiencing stress from time to time. Additional demands on students’ time and energy (such as jobs, families and friends) can cause them to feel stressed more often than not.

In March 2008 many media outlets reported that 4 in 10 college students experience stress often. While this particular finding got a great deal of attention, it is probably not much of a surprise to you. Many of the other results of the survey did not receive so much attention. Edison Media Research, who conducted the survey for mtvU and the Associated Press, asked the students 44 different questions on topics ranging from whether they knew anyone who served in Iraq or Afghanistan to their spring break plans. A total of 2,253 college students were interviewed between February 28 and March 6.

The goal of this study was to describe the opinions, feelings and behavior of American college students, not to influence them in any way. That makes this study an observational one. It was not a census, because it did not record information about all college students in the country, only 2,253 of them at 40 randomly selected 4-year schools. Based on the sample data, conclusions were drawn about the characteristics of all American college students (the population).

In general, polls and surveys are observational studies. Their purpose is to obtain information, not to change opinions or behavior. But the situation is not always as simple as it seems. During election cycles, voters sometimes receive “push polls,” telephone calls that purport to be objective opinion polls, but actually present a statement about a political candidate’s actions or beliefs in an effort to discredit the candidate. In the 2008 primary campaign for the presidency, voters in New Hampshire and Iowa received calls raising questions about Mitt Romney’s Mormon faith. Such tactics are considered unethical by legitimate polling organizations. Similarly, political parties sometimes mail out surveys which are thinly-veiled efforts to entice donations from supporters.

The video Snapshots: Sampling discusses issues faced in developing surveys for the Times/Bloomberg Poll.

1.2.3 What Do Experiments Accomplish?

As the consumption of bottled water has increased worldwide, many people have become concerned about the environmental effects of the single-serve bottles typically used for such water. The bottles are produced using petroleum products, and, although they can be recycled, many end up in landfills. Among environmentalists and those concerned about cost, refillable hard plastic bottles have seemed like an ideal solution. However, controversy has arisen about these bottles as well.

Bisphenol A (BPA) is a chemical used to make polycarbonate plastic and certain types of epoxy resins. Polycarbonate plastic is used in impact-resistant plastics; epoxy resins using BPA are used in dental sealants and liners for food and beverage cans. Some people have become concerned about the effect of BPA on humans after a researcher’s discovery of chromosomal irregularities in lab rats exposed to BPA from polycarbonate plastic cages and water bottles. These findings led to further studies on humans; in 2007 the Centers for Disease Control reported that BPA was found in 93% of urine samples from 2,517 people aged 6 years and older.

Eighty student volunteers at Harvard University participated in a study investigating whether drinking from polycarbonate plastic bottles appreciably raised their BPA levels. Students drank from stainless steel bottles for one week, and then provided researchers with two urine samples. In the second week, they drank from Nalgene plastic bottles (which contain BPA), and then submitted two more urine samples. Samples were frozen and shipped to the CDC for analysis.

The goal of this study was to determine how much drinking from plastic bottles for a week raised BPA levels. This study was an experiment. The first week was designed to clear BPA due to plastic bottles from the individuals’ systems and establish a baseline level of BPA in their urine. In the second week, the treatment of drinking from the plastic bottles was imposed on the volunteers. BPA levels in urine after the first week and after the second week were compared. Researchers found that drinking cold liquids from polycarbonate bottles for one week increased urinary BPA levels by more than two-thirds.

It is important to notice what researchers were able to conclude. The study showed that drinking from the polycarbonate bottles raised the level of BPA in urine. It did not determine whether increased levels of BPA cause hormone-like effects in humans similar to those demonstrated in tests on laboratory animals, nor if the increased levels of BPA are dangerous for humans. Experiments are generally designed to answer a very specific question. The results obtained frequently suggest additional experiments and new research questions, as they did in the case of BPA. In 2012, the US Food and Drug Administration denied a petition from the National Resources Defense Council to ban BPA from any use where it comes in contact with food. You can learn more about the studies which led to this decision by listening to the NPR story linked here.

Watch the video StatClips: Types of Studies to see an example which illustrates the differences between observational studies and experiments.

Question 1.24

Determine whether each of the following studies is an experiment or an observational study

An article Adverse Events Attributable to Cough and Cold Medications in Children (published in the journal Pediatrics) examined adverse drug events (ADEs) from cough and cold medications in children using data from a sample of 63 American hospitals. The table below illustrates part of the analysis performed, representing the classification of ADEs by age and type of exposure.

Patient Age Unsupervised Ingestion Supervised Ingestion
No Medication Error Documented Medication Error Documented
Percent Percent Percent
Less than 2 years 50.0 31.8 18.2
2-5 years 77.9 18.1 4.0
6-11 years 33.3 55.6 11.0
Table 1.4: Number of Cases of ADEs from Cough and Cold Medications

Source: Pediatrics

oz7L0n7g4o1nxf+izDXcXg+DhZL8eGIy20WNZgJV+gnmWLoOajLNFjr6PSZRH24LC6Mz5ifdlohIgRoUt+DYug==
Correct. The study was an observational study. The individuals studied were adverse drug events from cough and cold medications occurring in children seen at 63 hospitals during 2004 and 2005.
Incorrect. The study was an observational study. The individuals studied were adverse drug events from cough and cold medications occurring in children seen at 63 hospitals during 2004 and 2005.

Question 1.25

A study at the University of Northumbria in England investigated the relationship between gum chewing and memory. Seventy-five participants were divided into three groups to take a 20-minute memory test. One group chewed gum, one group made chewing motions but without any gum, and the third group did no chewing. Researchers found that recall improved 35% for those who chewed gum.

7ox/psL4KCGk7al/rD7qXBv83b6wI+iIp4myJBYg62Y9ZwDJydK3XXDuI1qA96d0ebClT9at7BoDwGQXg1HsOw==
Correct. The study was an experiment. The individuals studied were 75 participants in a memory test. There were three treatments: gum chewing, chewing without gum, and no chewing.
Incorrect. The study was an experiment. The individuals studied were 75 participants in a memory test. There were three treatments: gum chewing, chewing without gum, and no chewing.

Question 1.26

A study published in the journal Animal Cognition investigated the duration of cats’ working memory for hidden objects. Twenty-four cats were divided into groups of 6, with each group given different visual cues on or around boxes behind which a desirable object was hidden. The cats were trained to locate the object, and then tested after 0, 10, 30 and 60 seconds. There were no significant differences among the groups during the training or testing. The cats’ ability to locate the object declined rapidly between 0 and 30 seconds, but still remained higher than chance at intervals up to 60 seconds.

7ox/psL4KCGk7al/rD7qXBv83b6wI+iIp4myJBYg62Y9ZwDJydK3XXDuI1qA96d0ebClT9at7BoDwGQXg1HsOw==
Correct. The study was an experiment. The individuals were 24 cats, divided into 4 groups. There were four treatments, the different visual cues given to each group.
Incorrect. The study was an experiment. The individuals were 24 cats, divided into 4 groups. There were four treatments, the different visual cues given to each group.

Question 1.27

In 2007 Consumer Reports conducted a study on the amount of caffeine in decaffeinated coffee. “Secret shoppers” purchased 36 small (10 to 12 ounce) cups of decaffeinated coffee from a variety of retail chain locations. The coffee was then analyzed in Consumer Reports’ laboratories, and the amount of caffeine in each cup was recorded. Most of the cups of coffee had less than 5 mg of caffeine, but amounts of 21 mg, 29 mg and 32 mg were also found.

oz7L0n7g4o1nxf+izDXcXg+DhZL8eGIy20WNZgJV+gnmWLoOajLNFjr6PSZRH24LC6Mz5ifdlohIgRoUt+DYug==
Correct. The study was an observational study. The individuals studied were 36 cups of coffee.
Incorrect. The study was an observational study. The individuals studied were 36 cups of coffee.

The best advice concerning interpreting data from experiments and observational studies is to use caution. Consider who conducted the study, who sponsored or paid for it, and what the researchers were attempting to determine. When a sample was involved, how large was the sample and how was it was selected? What methods were used in an experiment? How were the survey questions worded? These questions highlight important aspects of a study that may influence the results. We will look further at these topics in Chapter 2.

The video StatClips: Statistics Introduction gives you a birds-eye view of where we are headed in our study of statistics. Don’t worry about understanding all the terminology at this point, just focus on the big picture. You may want to return to this video as you progress through the course to better understand the basic concepts of statistics.