EXAMPLE 1 Data from a Student Questionnaire

Table 5.1 shows a partial dataset that describes the students in a statistics class. The data come from anonymous responses to a class questionnaire. Most data tables follow this format: Each row records data on one individual (in this case, one student) and each column contains values of one variable for all the individuals. This dataset appears in a spreadsheet program that has rows and columns ready for your use. Spreadsheets are commonly used to enter and transmit data, and spreadsheet programs also have functions for basic statistics.

183

Table 5.1: Table 5.1 Excerpt of a Dataset Displayed in the Microsoft Excel Spreadsheet Program
image

The partial spreadsheet shows information on seven individuals in rows 2-8. The questionnaire consisted of five questions, which are represented by the five columns in the spreadsheet. Sex (female or male) and handedness (left-handed or right-handed) are variables that are usually described as qualitative or categorical because they categorize individuals by traits and do not take numerical values. The remaining three variables are quantitative or measurement variables because they do take numerical values. They are height (inches), time spent studying (in minutes) on a typical weeknight, and the amount of money in coins (cents) students are carrying. Our main focus in this chapter will be on variables involving quantitative or numerical data, because you probably have already had much experience with the usual ways to summarize categorical data (proportions, pie charts, and bar charts).

Knowing the context of the data—that these are student responses to a class questionnaire—helps us make sense of them. For example, one student claimed to study 1500 minutes on a typical night. We know that this is impossible! (Perhaps the student miscalculated when converting from hours to minutes or it was a typographical error.)