Talking about data: Individuals and variables

4

Statistics is the science of data. We could almost say “the art of data” because good judgment and even good taste, along with good math, make good statistics. A big part of good judgment lies in deciding what you must measure in order to produce data that will shed light on your concerns. We begin with some vocabulary to describe the raw materials that go into data.

Individuals and variables

Individuals are the objects described by a set of data. Individuals may be people, but they may also be animals or things.

A variable is any characteristic of an individual. A variable can take different values for different individuals.

[Design Element Start: Math Inserted by Jouve]

(1.1)

[Design Element End]

[Design Element Start: Inserted From IPS9e]

145 139 126 122 125 130 96 110 118 118
101 142 134 124 112 109 134 113 81 113
123 94 100 136 109 131 117 110 127 124
106 124 115 133 116 102 127 117 109 137
117 90 103 114 139 101 122 105 97 89
102 108 110 128 114 112 114 102 82 101
Table 1.1: TABLE 1.1 IQ Test Scores for 60 Randomly Chosen Fifth-Grade Students

[Design Element End]

For example, here are the first lines of a professor’s data set at the end of a statistics course:

NAME MAJOR POINTS GRADE
ADVANI, SURA COMM 397 B
BARTON, DAVID HIST 323 C
BROWN, ANNETTE LIT 446 A
CHIU, SUN PSYC 405 B
CORTEZ, MARIA PSYC 461 A

The individuals are students enrolled in the course. In addition to each student’s name, there are three variables. The first says what major a student has chosen. The second variable gives the student’s total points out of 500 for the course, and the third records the grade received.

[Design Element Start: Math Inserted by Jouve]

(1.2)

[Design Element End]

Statistics deals with numbers, but not all variables are numerical. Some are “categorical” and simply place an individual into one of several groups or categories. Of the three variables in the professor’s data set, only total points has numbers as its values. Major and grade are categorical, and to do statistics with these variables, we use counts or percentages. We might give the percentage of students who got an A, for example, or the percentage who are psychology majors.

[Design Element Start: Math Inserted by Jouve]

[Design Element End]

[Design Element Start: Heading Level-2 and Level-3 Inserted by Jouve]

Statistics

Statistics is the science of data. We could almost say “the art of data” because good judgment and even good taste, along with good math, make good statistics. A big part of good judgment lies in deciding what you must measure in order to produce data that will shed light on your concerns. We begin with some vocabulary to describe the raw materials that go into data.

No Evidence

“No evidence” that magnetic fields are connected with childhood leukemia doesn’t prove that there is no risk. It says only that a very careful study could not find any risk that stands out from the play of chance that distributes leukemia cases across the landscape. In other words, the study could not rule out chance as a plausible explanation for what was observed. Critics continue to argue that the study failed to measure some important variables or that the children studied don’t fairly represent all children. Nonetheless, a carefully designed observational study is a great advance over haphazard and sometimes emotional counting of cancer cases.

[Design Element End]

Categorical and numerical variables

A categorical variable simply places an individual into one of several groups or categories.

[Design Element Start: Math Inserted by Jouve]

[Design Element End]

A numerical variable takes numerical values for which arithmetic operations such as adding and averaging make sense. A numerical variable is sometimes referred to as a quantitative variable.

5

Bad judgment in choosing variables can lead to data that cost lots of time and money but don’t shed light on the world. What constitutes good judgment can be controversial. Here are examples of the challenges in deciding what data to collect.

Example 1 Who recycles?

Who takes the trouble to recycle? Researchers spent lots of time and money weighing the stuff put out for recycling in two neighborhoods in a California city; call them Upper Crust and Lower Mid. The individuals here are households because trash and recycling pickup are done for residences, not for people one at a time. The variable measured was the weight in pounds of the curbside recycling basket each week.

The Upper Crust households contributed more pounds per week on the average than did the folk in Lower Mid. Can we say that the rich are more serious about recycling? No. Someone noticed that Upper Crust recycling baskets contained lots of heavy glass wine bottles. In Lower Mid, they put out lots of light plastic soda bottles and light metal beer and soda cans. The conclusion: weight tells us little about commitment to recycling.

Example 2 What’s your race?

The U.S. Census asks, “What is this person’s race?” for every person in every household. “Race” is a variable, and the Census Bureau must say exactly how to measure it. The census form does this by giving a list of races. Years of political squabbling lie behind this list.

How many races shall we list, and what names shall we use for them? Shall we have a category for people of mixed race? Asians wanted more national categories, such as Filipino and Vietnamese, for the growing Asian population. Pacific Islanders wanted to be separated from the larger Asian group. Black leaders did not want a mixed-race category, fearing that many blacks would choose it and so reduce the official count of the black population.

[Design Element Start: Inserted From IPS9e]

image
Figure 1.1: FIGURE 1.1 Spreadsheet of food discount coupons, Example 1.1.

[Design Element End]

The 2010 census form (see Figure 1.1) ended up with six Asian groups (plus “Other Asian”) and three Pacific Island groups (plus “Other Pacific Islander”). There is no “mixed-race” group, but you can mark more than one race. That is, people claiming mixed race can count as both so that the total of the racial group counts in 2010 is larger than the population count. Unable to decide what the proper term for blacks should be, the Census Bureau settled on “Black, African American, or Negro.” What about Hispanics? That’s a separate question because Hispanics can be of any race. Again unable to choose a short name that would satisfy everyone, the Census Bureau decided to ask if you are of “Hispanic, Latino, or Spanish origin.”

image
Figure 1.2: Figure 1.1 The first page of the 2010 census form, mailed to all households in the country. The 2010 census form can be found online at 2010.census.gov/2010census/about/interactive-form.php.
(Source: Census.gov.)

6

The fight over “race” reminds us that data reflect society. Race is a social idea, not a biological fact. In the census, you say what race you consider yourself to be. Race is a sensitive issue in the United States, so the fight is no surprise and the Census Bureau’s diplomacy seems a good compromise.