1.1 What is Statistics?

statistics
Neque porro quisquam est qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit.

Statistics is the process of collecting, describing and drawing conclusions about data. But what, you may ask, are data? Data are observations or measurements made about the people or objects we are investigating.

In any statistical situation, questioning plays an important role. It aids a researcher in selecting groups to study, techniques to use, and methods to evaluate results. It helps a consumer of statistics judge the reliability of the many reports found in the media.

In this section we look briefly at the process of statistics, considering examples that illustrate the ideas we will study throughout the course. In each case, we will consider several questions:

Let’s look at each of these questions in turn.

First, why are we conducting the study? Before we begin to use statistics, we should identify exactly what we are interested in finding out. Do we want to investigate a relationship between two or more characteristics? Are we trying to gauge public opinion or behavior on a certain topic? Do we want to summarize the results of some event or activity? Are we trying to verify the safety of a product or the effectiveness of some treatment? All of these are appropriate reasons to use statistical techniques. As we shall see throughout this course, determining exactly what technique to use depends in large part on our question of interest.

Second, who are the individuals we will study? We always need to be clear about from what individuals data were (or will be) collected. When we use the word individuals in everyday English, we are generally referring to people. In statistics, individuals are the people, animals, or inanimate objects about which we want to learn. Each datum (the singular of the word data) is a little piece of information about an individual. When a set of data is collected, we have one or more pieces of information about each individual.

Question 1.1

Researchers at Baylor University conducted a study to determine whether relationships existed between perceived stress, self-esteem, and physical activity in both male and female college students. Ninety students were enrolled in 5 sections of a health and human behavior class; seventy-four consented to participate in the study, completing three surveys: the Rosenberg Self-Esteem Scale, the Perceived Stress Scale, and the International Physical Activity Questionnaire. For both men and women, there was a relationship between perceived stress and self-esteem, but there was no relationship between physical activity and either perceived stress or self-esteem.

bu2IPGpB/0IayuiDU3ENkNIoPuXyQNyCYgZR6reEI/NWEMfLTYgDT8q0TNwrgJIHLN1GhgcT1tgYc5484hLW0tM7u9wp5flJbvB+Dw1A5ClUNXwbcy2iP4Qc2CCqzl483iQxcDaMrh0w30ZuByfgt4VQL5HdD/jWXilCzNz9j3daiXNNKFEttLzgalkbp/b+MeDYdBjrU8Y1blPgnfz2kQ9iTMVE3v7Ap7G98sVJctejM7TLKqCF4W/uyK9IPwI/OAvk1DFx4CWguv9vCZDM3aGdWLPnUoMZrpy1Vh/8A6r6/ujMnTGs6tfRxTvbXCJ8ETJkSldaoaza7+wEN3FhxkkuGIcAcB/EZe0iiOwcfXI4YUWly6zQlYUrrdofhoPMUH/ogWx2Uhi6wbDs8KnIyFs8dT+nzdOTyoSModt9JEg0SnfYfItr8e3b8pDPm5NyB0IzRvsR3VskznqX0v7tHPyag+MYTEqPy0mcDrNRifHNfnDff35oceL5zf4znSxYT3e8d1zVl+Zu84hE95j7Y7LvQ2FgdCur/xvXuGJlK37t+hvr6YJeQo8D9QBh+ZWJKgxmnvSh+HQe/N2aJX6S/hDGyFfC0DbOYcH8iJpBivgmU8cdluyx/3tSBsY0+JvTVfoT35kcYLKaigvrVs5LavoHGHAxwA8ViBMlKg+NLm6CzdCPRzF0+nMd+7DVqZdYjLIXD8kOt76ZluALBIX+/o+p2Pu16Gz/j6uylavAE20hg6a5YD+wz5Rh9rd2meMqKsQVNwTG5NrfTzZNICQMbU4iqFKqNCGrOLpyijSufmbjh4CJm0xuRiTUOPTVwrd0vkA7+YT5tdscy7tJ3xEBCFRebfY=
2
That is not correct. Please try again.

Question 1.2

v66+/DAcvM1cqeLYE9BL/e1Z+EZTLBsynhT51SrSUfp11uy9I1ezJim/z5P7zyuxKi0AKK/e330hqPU0VsPRrrRP51oVEU7Lbbo8cRSjGoSVKK5gSMpVSefDQsaig7QTnoHaa1oiRUFNlFo9AQ5Hdu1QRxfNts/gXnlODqdKGcYgCnVjJT/ywJ92jK122f4O1wRzRjZrDx7E7O8MR2wS9TxY40uKoD4XCJghi8T0nXQXN48O7F8R3clqKygyHMziGQk0wnuIgW3+V5PO5/KxVG2RXHkBj7FfaX+/Tbxdkeg/qb4YqCnMsKYwKb6ZusScTBdb7yzm//yZH2snqd4q/XD6wbhQ4JXnNPAQkbxAk9h0Z8TUN+MbvWhLv09yEwlFxUrnngL9XX038jBhCNlzS0wDPiDND3EPM9botYgcJ6uSqsZxMfQQP2TSoIZQXPAw5UojYgoc4yDuYGhBym491ejXDXf72Af+3wN2LSu2S06Am9VlOibUv1BaIVsPRG+H7VL31VEiaB5LvLqEl/Hdwj6wvV2feW5QtPqHraN6g1Vzkijicps5Mt0RVymJLQdbefTPlS06IFKwYjQyf37gG84WxCAyjspCeUhn90U+9fdEw5ER7VKLCbsNW646meHe1+EmzjA9GubTiDKUsY1VZjbW230qjFG4hI8K+UDmcJlLmjaTcmF9Dl3pTUmUYfo9o3wBdejA3yg5ScpFAhfxAJ/4VHuvBQuApUQKSVNjMVAzVUX/6R9OwQ==
2
That is not correct. Please try again.

Our third question is what characteristics of the individuals are we observing or measuring?The data we collect tell us about one or more characteristics of the individuals. We call these characteristics variables, because they vary from individual to individual.

Some variables are quantitative; they are numerical in nature, and represent characteristics that we can count or measure. For an automobile, the number of cupholders it has and its gas mileage are both quantitative variables. We count the number of cupholders (the value must be a whole number); we measure the gas mileage (the value is some number greater than zero). If we measure a variable, then we usually attach a unit of measure to the result. In the case of gas mileage, we would likely measure in miles per gallon or kilometers per liter, and perhaps round to the nearest tenth of a unit.

Other variables are categorical (or sometimes called qualitative); these are non-numerical characteristics of an individual that are generally described with words rather than numbers. For an automobile, color and make would be categorical variables.

Here we say generally described with words rather than numbers, because some variables that are really categorical are described with numbers. For example, postal codes actually indicate geographic areas; they neither count nor measure anything. The zip code 60622 has meaning not as a numerical value, but rather as a particular section of the city of Chicago. An easy way to decide whether a variable described using a number is really quantitative is to ask whether it makes sense to do arithmetic on the observations. If the answer is no, then the variable is categorical rather than quantitative. While it is reasonable to talk about the average weight of a songbird, or the average number of Facebook friends certain sorority members have, calculating an average zip code would be meaningless.

Sometimes we are interested in only one piece of information for each individual; sometimes we are interested in two or more. In the study concerning the relationship between perceived stress, self-esteem and physical activities, these three variables were observed. Perceived stress was measured using the Perceived Stress Scale, which gives a value between 0 and 40 for each individual. Self-esteem was measured using the Rosenberg Self-Esteem Scale, which generates a value between 0 and 30 for each individual. The International Physical Activity Questionnaire was used to report each individual’s physical activity level, with scores in this case between 0 and 10,000. All three of these variables are quantitative.

To consider how data analysis applies to the risk of asteroids impacting the earth, you can view Snapshots: Data and Distributions

Question 1.3

qYvdJr47C2K8DG3ogBU3hYqSxAwK6RobNkunBNfyFUePljUiLCgn8YJKkh5pgn94KzChMNckf3knxqZOsZQ6Fee1JdLmXLoAByyLOIq5hswDWQOCVvZbQBVcnIiWduLyAAd8gQ==
Correct. The travel time between a person’s hometown and Washington, D.C. is a numerical characteristic of an individual, so it is a quantitative variable.
Incorrect. The travel time between a person’s hometown and Washington, D.C. is a numerical characteristic of an individual, so it is a quantitative variable.

Question 1.4

Determine if each of the following is a categorical variable or a quantitative variable.

GzSPv4mbK94FlYDDIF6ZYfymKwy2zrUJVhl+Qfr8w5s5uLvKT1ys5vp2av2fFQroOvXRt0aGTEHnFMKiIxLOxYOlJBlAnnC3cOmHuoyp8ijxSBaF
Correct. The cell phone carriers available in a given city is a non-numerical characteristic of an individual, so it is a categorical variable.
Incorrect. The cell phone carriers available in a given city is a non-numerical characteristic of an individual, so it is a categorical variable.

Question 1.5

gsC8R/s9YekHWBTGboTJVa2j2Dv2sOCokvtipYt0MGOSeW8S+qsrPi1UpQsb0l1hS+tTw9LVm3EfRYF4OxebOPxOSJpYOFzybvYimA==
Correct. Whether or not one is a registered voter is a non-numerical characteristic of an individual, so it is a categorical variable.
Incorrect. Whether or not one is a registered voter is a non-numerical characteristic of an individual, so it is a categorical variable.

Question 1.6

Rh+2OXNzawop62k1RP36ryXK/N4B1hJzmop3MGf5kpUgIvei8l1KOZLVZflRCmgDgnsdpHuUL9WtC7l1pti4xGHXgt4cA+iY
Correct. The weight of various pieces of fruit is a numerical characteristic of an individual, so it is a quantitative variable.
Incorrect. The weight of various pieces of fruit is a numerical characteristic of an individual, so it is a quantitative variable.

Question 1.7

+lUx6/LChSGf8ZlNnhlmUjjY5e5LJSqExLGdAVBtaGqYFpgQ7IYYEk9JdRbDE8ugBZZLDuTAEh5wSPQhHJyRBmPcaO87sAsKJBsxdhYt/aZLnbgcJegpqg==
Correct. The tomato varieties available at the farmers’ market is a non-numerical characteristic of an individual, so it is a categorical variable.
Incorrect. The tomato varieties available at the farmers’ market is a non-numerical characteristic of an individual, so it is a categorical variable.

Question 1.8

8P3gPbI5XUh2PaQ0uqbYRIC7+R4sMgZ57YQtRL3jGIM2nMnc/n8PyjM7vuOAYxke/JLnqSnyB78xfBi66t2dJnMhRdUnOLeh
Correct. The length of trout in a certain river is a numerical characteristic of an individual, so it is a quantitative variable.
Incorrect. The length of trout in a certain river is a numerical characteristic of an individual, so it is a quantitative variable.

Question 1.9

RoinqvhbXCaUbQdfebc+w60RVXNubzCZeN0rDc81E40uroGfxvTmOyvcUitprNA9GCG3Vv7SkmiQTbHtGcL7SADM5G7BcJezNCByvvnTBdfaGosghCJ2mA==
Correct. The number of days per week that a person exercises is a numerical characteristic of an individual, so it is a quantitative variable.
Incorrect. The number of days per week that a person exercises is a numerical characteristic of an individual, so it is a quantitative variable.

Finally, we ask how were the data analyzed and reported? Raw data consist of lists or tables of the observations. Unless we are collecting a very small set of data, raw data are difficult to use. Thus, we frequently summarize the data, either using a graphical, verbal, or numerical description to explain any patterns or trends.

Our original question of interest tells us why we collected the data. Once we have looked at the data through the eyes of statistics, we can report what we learned. Were we able to answer the question, or does it need further study? How sure are we that our results are reliable? Once again, our question guides us in proceeding. If we want to determine whether a certain medication is safe for infants, we want to be much more confident in our results than if we want to know whether the majority of people prefer chocolate ice cream to vanilla.

These are questions that we will be returning to throughout our study of statistics. The manner in which we analyze and report data can get a bit complicated, but we will start slowly, and add more detail as we go along. As you learn more about statistics, you will (hopefully) begin to appreciate the subtleties involved in making decisions using statistics.

The StatTutor Lesson – Individuals and Variables gives you practice with identifying individuals and variables, and provides you with a preview of the techniques we will use to summarize data.

1.1.1 More About College Students' Self Esteem

Is there such a thing as too much self-esteem? Much has been made of the supposed high self-esteem of today’s young people. But, in fact, are 21st-century college students more self-centered than those in the past? Researchers at several universities set out to answer that question, and conducted a comprehensive study. The participants were 16,475 college students nationwide who completed an evaluation called the Narcissistic Personal Inventory (NPI) between 1982 and 2006.

The NPI has 40 questions, each containing a pair of statements (such as “I am a born leader” and “Leadership is a quality that takes a long time to develop”). The participant selects the statement that best describes him or her. Based on these responses, a score between 0 and 40 is assigned; the higher the score, the more narcissistic the person. The researchers reported that there was a steady increase in the NPI scores between 1982 and 2006, and that 67% of students had above average scores in 2006 compared to 30% in 1982.

Let’s apply our four questions to this study.

The researchers were interested in whether college students exhibit more narcissistic tendencies today as compared to students in the past.

In this study, data were collected from 16,475 United States college students between 1982 and 2006.

For each individual, the year in which the NPI was taken and the score on the inventory were observed. The score on the inventory is quantitative. In this case, the year is a categorical variable because it serves to identify when the particular student took the inventory. Like other variables that appear numerical (but might not be), year always requires a second look.

Researchers compared NPI scores for students in 1982 to those for students in 2006. 67% of scores were above average in 2006 while 30% were above average in 1982.

Such a study may be of considerable interest to you as a current college student, and to your professors as they select educational strategies for their courses. Future employers may wonder how such a psychological profile will impact their workplace policies.

Please review this WhyWhoWhatHow Whiteboard:

Question 1.10

As part of its study of cavity-nesting birds, the Cornell Lab of Ornithology obtained data to explore whether the orientation of nesting boxes has an effect on breeding success. Participants in The Birdhouse Network (TBN) reported details concerning the nesting of Eastern bluebirds in nesting boxes. Orientation refers to whether the box entrance faced north, northeast, southwest, west and so forth. Also recorded for each box were the clutch size (number of eggs laid), the number of nestlings (baby birds hatched), and the number of fledglings (nestlings that left the nest).

4MVaOxP0mGngbhns3/Qzr9YSyupHTv2kDKmH5kAJBQXOoQfAniNwl3U3FgVrCMXiepCKyXRrchIg6o6wtp0glXr3OKpxiGApAR3Jq57VO3hxOZ53XMAnLaMi2jF+HhYJD3uk7Jv+eur8zmuY51/E0N6CMl2AK8Pb68+HUz+FzxfaDIbg3oB7OrjJq7GdRYZrPu8hDbng/lCwnZhi2ZAATBzdAOIjLzOBjyDu7qtx4zzNgxqhavx0wQJUd3CoJXVdddHjFWYkVyd9Yb3WaraMviK/jaccErUH/VOjg+6uZ4inBKrRjFRXWuI/Eow=
2
That is not correct. Please try again.
Correct. The study investigated the relation between box orientation and fledging.
Incorrect. The study investigated the relation between box orientation and fledging.

Question 1.11

zMQ+NihphZwvUxd5qU3r/t8hZJE6Fc+ytPM0K3luRyNiwkaIE4BG7fdSZVuRwXVcYIhizsJsXP8i+5vj9hOCJ3Xz8/Uzu3CLiIDITPTDTnWaiTsOwtRJmuADvRL7UAQ602cPApq4Bk0GBlUcNgJqSz1pbt9OPbb2HG2iawIIKWVZDrIDgNjoiIA01WRiDH4y4wtUgvpjWRlVvXs9niDx7tK94WJa1DFBl/Bretbr13JuB3QgM8z7ZO17v6T/E+Is8Ba789cOVS5M4QFt60GOlybZJB7JcP90MW7gNcvm6sevUSh0Odu12i/A6WiJpR3FJv2XiGxNwX8ulm8dNkHOmbVf0BF97amTpXsu0WkJuEjtKv7A2hW3gg==
That is not correct. Please try again.
Correct.
Incorrect.
2

Question 1.12

jabXvBMlBENiRX+OVkCf8WzKXk+HRRtnBZqv80AJwsWvaa9zKzU/QX2avffGPqT2ICUIS5Xg2NRXRzi9QLKgY3Pp57HuWCTVWlOjFS2kKyE2svMxB/ok5dIkVIhSTeDyf0DC1H5xrgrsOUnDrP7lPln6Gu5ouppxGNewNRX5VShdE82O2Cq+0k5yDRyFwFMTY8ZyDpCSyW2nkFae1uyQsXLnvgLKljslUIW5Lw==
2
That is not correct. Please try again.
Correct. The individuals studied are particular nesting boxes. The number of boxes is not a characteristic of an individual box.
Incorrect. The individuals studied are particular nesting boxes. The number of boxes is not a characteristic of an individual box.

Question 1.13

Once the data were collected, numerical summaries were calculated. The table below shows these summaries for each box orientation.

Orientation of Nesting Box Average Clutch Size Average Number of Nestlings Average Number of Fledglings Success Rate (number of fledglings divided by number of eggs laid)
North 4.47 3.92 3.42 0.7692
Northeast 4.51 4.07 3.8 0.8415
East 4.48 4.04 3.5 0.7842
Southeast 4.46 4.05 3.49 0.7844
South 4.46 3.98 3.39 0.7626
Southwest 4.46 3.91 3.41 0.7638
West 4.46 3.83 3.28 0.7435
Northwest 4.41 3.9 3.5 0.7974
Table 1.1: Numerical Summary for Box Orientation
RSXbweJh2BlQ7lZJ6jKRnqeyIU+nlJ6FnJlTkuUIdTxftVKwSFvuYuGMwObS0pXDs0ALbr5OuLnfQXM29K/U08l6yaY+vJw/1ZNaU51k/QdVYDzYrOhr/yCqY9KWw2y3eyOj4nEzD6aKwfNBM96BtVPHbnBsFBShevT3+um6mHmcTkbW5e8Xm9HAcyoxs9wHaak6H1/vNp5IduI0bErI4hZKl/DnsuKPd5b0UCmj8J5D//fLXCaidp0Yn9vfPk4HEkbv06A8tWON1aN5xuSu1wnWwKiSguHz/qgaTA==
2
That is not correct. Please try again.
Correct. The results shown in the table summarize the quantitative properties of each box orientation.
Incorrect. The results shown in the table summarize the quantitative properties of each box orientation.

1.1.2 Shakespeare by the Numbers

Let’s consider another example to which we can apply our question strategy. Open Source Shakespeare is a database containing a wealth of information about the works of William Shakespeare. We can use the website to investigate whether Shakespeare’s tragedies are more complex than his comedies by considering the number of lines, the number of scenes, and the number of characters in each play. For the tragedies, the average number of lines was 1074, the average number of scenes was 24, and the average number of characters was 40. For the comedies, the average number of lines was 906, the average number of scenes was 16, and the average number of characters was 24. In each case, the average for the tragedies is significantly larger than that for the comedies.

Why? The question of interest in this study was whether Shakespeare’s tragedies are more complex than his comedies.

Who? Data were collected from the 11 tragedies and 14 comedies written by William Shakespeare.

What? For each play, the type of play, the number of lines, the number of scenes, and the number of characters was recorded. The type of play is categorical; each of the other variables is quantitative.

How? For each of the genres (tragedy and comedy), the average value of each variable (number of lines, scenes, and number of characters) was reported.

When we compare this study to the previous one about the orientation of bluebird nesting boxes, we notice a difference between the groups of individuals studied. Because the number of Shakespeare’s plays was small, and the data were readily available, we were able to analyze characteristics of all the tragedies and comedies.

In the study involving bluebird boxes, researchers wanted to determine whether there is a relationship between the orientation of the nesting box and breeding success. However, because it was impractical (if not impossible) to study all such boxes, the researchers selected a smaller group of individuals to study.

Here we see an important distinction between two groups—a distinction that is critical to the statistical process, but sometimes confusing to beginning statistics students. The population is the set of all individuals that we want to describe or draw a conclusion about. When it is not practical or not possible to collect data from the entire population, we collect data from a subset of the population; we call this subset the sample.

You are probably familiar with the concepts of population and sample from reading opinion polls, such as the Gallup Poll, which draw conclusions about the beliefs or behaviors of a large population based on relatively small samples. (If you wonder about how this can be done reliably, stay tuned for a later chapter!) We note here that population, like individuals, refers to whomever or whatever we are interested in studying (not just people).

Question 1.14

For each of the following determine whether data appears to be collected from a sample or from the population.

FAtvpf0hXtPMo1x9gyUVImj40BZwzp+E2Dwg6PgjzRb6CHJ6KzY2qRizKVBYB2wMgi9rW9LgIccT5bbRfooKDNPxpWJii3pBOZoQW1FBt1da0rzL5qUyofSC5c2gnk0UQYYLvaaD19Ev6f9sFakbb7BM5Lf1Eq5wb51h5oFS0iw5Aak42pyUvYE4QecAZwVmjMEn+ne1FIo=
Correct.
Incorrect.

Question 1.15

hARgjYQ0VqVTWtYxnnG6BWYMUv0kPA39pt1i4yxLSwrCjEHOPqODstR4JGfHSXAdzfjW5tkVKLZxhqIe6NhaFrvc7PKdF47Jf3MwgZF8jCPDJWnvVmAYxJGWe0Oxpp8Kbynr8QGarCwqTuiYJXXcl08vxQqZDaOct91sdEC0hkJ8Y6taqtl5HiBrFfTzGjN5yhZ4v/TrjxOcczMT2SiSVlWhp1WevLupW8HRmdg1mIw5uQfEBi0FQK7B7V0XM6K/09hHOzKX5VnJSWaetnhXed2N3B1J79jAcwL6lNBF2auenhkqcVd1stOAsZIw7tQd1k9J6Ba7hIOOZTync3mkww==
Correct.
Incorrect.

Question 1.16

6KPOFVUhS6IlLJjiG27aSbzxZDkPZ8RENYPI5qNYuxeJrI8W39UurGjNTTNPjajzJh1loTdzH7+IztLooViOgBhxzdUBE96AKF/2nnrhU9jh2DT1O0c+kON9OxYZD2fQSmegwzW2JBrE0JarvHmXQuij5akMoV3C
Correct.
Incorrect.

Question 1.17

WrlQca1D4c7t9B9M+XBAjHpX7fpZFzuIpBQNwOpf5KKkyrKMzffQLSsHVgQKtOO4EE5hoLw8CYilFdd0Jk/NSHfud+8z5GlHC9QIuhxtlV4h0tPMNrBTg/j4Lt2QK64CaJILWo8pypGtHhgr1dvKE11hKvOjKXuldXFHahuK/PQBvEPFn2m9y/Xwssx/wuLh4JehRW54Vkao5gUIJh6DsPRxP6E=
Correct.
Incorrect.

Question 1.18

4gCDnBe4Kq++9mKh1G/kblfu0NBFZPFyTRRfiQjS2DCbV8BwFVk/o/qxLt0WrfK86FB5y7kbcwv4o5rwLgQUlUQ5GJ5vOjTfn2RXUyiuKTSqftR7PgOkJO/2elt0WDntuqZeuzQvOYNhJIq88/iPKFio5SF9+eGK4v+v/2X/HMusZu+fayDKu6x90fFQ7oK5lxO6nRNTzPf7FHubLFjBAEtTyKJlgxSMmpVzKDGlRWdkY0H63nwpvw==
Correct.
Incorrect.

Question 1.19

rkFNH4PQ36JHzLgftXWJwojG3FhEJvp2tA6Ua5vtEreROm58O0ODJURa5GJf1df6IQZJLV5iF5ouUjkNVMlvSggwJx5GVR2Ar9ANMcqr9jMUH9ZPnHTbR22ylWjiWeQ0CG0/6eUvhz4=
Correct.
Incorrect.

1.1.3 Caffeine and Birth Weight

Regular, Decaf, or just no coffee at all? Pregnant women are usually quite concerned about the effect of what they eat and drink on their unborn children. Trying to minimize or eliminate caffeine may cause women to rethink that wake-up cup of coffee.

For a number of years doctors have encouraged women to limit their consumption of alcohol and caffeine during pregnancy. Some studies have shown a link between a mother’s high caffeine intake and her baby’s low birth weight. A group of Danish researchers conducted an experiment to investigate the effect of caffeine reduction on both birth weight and length of gestation.

Pregnant women were randomized into either a caffeinated coffee group (568 women) or a decaffeinated coffee group (629 women). Once the babies were born, birth weights and lengths of gestation for 1153 single births were analyzed. The average birth weight for babies born to women in the caffeinated coffee group was 3539 grams; the average birth weight for babies born to women in the decaffeinated group was 3519 grams. The average length of gestation for babies born to women in the caffeinated coffee group was 280.2 days; the average length of gestation for babies born to women in the decaffeinated group was 279.3 days. The differences in average birth weight and average length of gestation were not significant; researchers concluded that “Providing decaffeinated coffee to women who drank three cups of coffee or more a day in early pregnancy had no effect on birth weight or length of gestation.”

Why? The researchers were interested in determining whether there is a connection between high caffeine intake and low birth weight or shorter gestation. The researchers want to draw a conclusion about all pregnant women (the population) based on information from the individuals studied (the sample).

Who? The data describe 1153 women who had singleton babies and their babies.

Why? The researchers were interested in determining whether there is a connection between high caffeine intake and low birth weight or shorter gestation. The researchers want to draw a conclusion about all pregnant women (the population) based on information from the individuals studied (the sample).

What? The researchers recorded whether each mother was assigned to the decaffeinated or the caffeinated group and her baby’s weight and length of gestation. The group to which mothers were assigned is a categorical variable; the baby’s birth weight and length of gestation are quantitative variables. Birth weight was measured in grams; length of gestation was measured in days.

How? Average birth weight and average length of gestation were given for the caffeinated and decaffeinated groups. Researchers concluded that the differences between the groups were not significant.

The values 3539 grams, 3519 grams, 280.2 grams and 279.3 grams are numerical summaries of the data collected. We call each of these numbers a statistic, because it describes a sample. When we have a number that describes a population (the average length of service of all U.S. senators, for example), we call that number a parameter.

Notice that we now have the two different meanings of the word statistics. When we started this section, we defined statistics as the process of collecting, describing and drawing conclusions about data. You might want to think of this as the “big S” definition—the big picture description of an area of study. (This use of the word is always plural.) Our new definition of statistics as numbers that describe a sample is then a “little s” definition. We talk about a statistic (singular) if we refer to a single number; we talk about statistics (plural) if we refer to two or more such numbers.

It is sometimes a bit tricky to decide whether you have a statistic or a parameter; the issue is whether the number you are considering describes a sample (then it’s a statistic) or a population (then it’s a parameter). If you calculate the average GPA of your English composition class to verify that your class is an unusually bright group of students, you are finding a parameter. If you are using the average GPA of your English composition class to estimate the average GPA of all English composition students at your college, then you are finding a statistic. Same number, different purposes—it depends on the set of individuals you are trying to describe. In the first case, the population is the students in your English composition class; in the second, the population is all English composition students at your college, and your particular class is serving as a sample.

You should not really use your class to estimate the GPA of all English composition students, since it is not a random sample. Students who choose your particular time period or your particular professor may have better—or worse—GPAs than what is typical at your college.

Question Sequence Holder

Question 1.20

Determine if each of the following numbers is a parameter or a statistic.

jQmOrJIjXe8wcJD/4mpR1JJKuXLmZ13Jd9EGIqZifx0hptV7hM4mlcIDZY5CBLsCtXC8EvVI/jdn+jnK90RTSpkBpJNoS9zjIZufXjnUAvVNmZ66dz7NZ8sjKYyZGX/g57HM2FCVGUpgrXTHgni5HGp82SfhrlRr82UGKALbPMcSAjhL03zcZ8nsvTBCFMhabEDLnFX/GEi6C/Fkbe88HRWpA/8=
Correct. The Gallup organization contacted a random sample of Americans to obtain this value. All Americans were not asked about gun ownership.
Incorrect. The Gallup organization contacted a random sample of Americans to obtain this value. All Americans were not asked about gun ownership.

Question 1.21

N6+QjBg2jqxHBFAx5w+EYtD7x/an7ZIwy0m5y8gc64izJvTXtmfCyKnYI6vHafnPEqNNyqFLjONypWMS68MK8GTiymhwlQMHTkVjZTFufB28jA8WVhkplPJqo6rY3vmiauOX1Fh7ppbwgr7pA+26pk0SaXHQXfReNDFs6cEdS28ru2Sk5XJ8Z2Gci6uy7iiLIrssLRK4pzCOm8Le/iMOv1MkPtJeW/WUzKd4Wntr0Ec=
Correct. This value was calculated using the penalty box time of each member of the Blackhawks during the (strike-shortened) 2012 – 2013 season.
Incorrect. This value was calculated using the penalty box time of each member of the Blackhawks during the (strike-shortened) 2012 – 2013 season.

Question 1.22

JSuvkBuhojfAo69dIKb1+vi/amoFXMJtJ7/EoOQv4n1xDYfnjR0QHI3f8BJ9/WNScCVgqOF5qBlTR+C6hW9s7ffaF/z5vQD0pRr+P/RfQ/vaR+VO1/482sSMxQAZY2ieX0yxTxfQA0OweAze
Correct. This average weight is based on observations of many honeybees, but it is not possible to weigh every honeybee that has lived or will live.
Incorrect. This average weight is based on observations of many honeybees, but it is not possible to weigh every honeybee that has lived or will live.

Question 1.23

6PLMHcJQStdbvsTVdbazqWTZwBOUlC/Y2+GbNcgaYch+Obr1c2bxvxGvv1OUhWUU0ord6rC4EkegvZG+E/RDQ9iD3Vun9Td1wA/LFcGHJYGzQBSCmmYQ4dLoyL1huw3z2TwNxcVPr7FJ6Nd8PRHgrGRukvn6EAg0gXSdFT6pj6+erbiynhYyAP+cQTfDFX1GyZU4jg1YGDG9aJQE
Correct. The Westminster Kennel Club reports the data for all winners of Best in Show since the show’s inception.
Incorrect. The Westminster Kennel Club reports the data for all winners of Best in Show since the show’s inception.

The examples presented in this section involve various statistical techniques and give a preview of some of the topics that you will study in this course. As you watch TV, browse the Internet, or read your favorite magazine, be on the lookout for graphs and discussions that present statistical results. You may be surprised at how often you encounter them! And when you do, try to apply our four questions to the situation.

Once again, but in a somewhat abbreviated form, let’s remind ourselves what information we should obtain from the answers to these questions.

Our discussion in this section illustrated these concepts. In the next section we’ll take another look at a few of these examples when we explore the question of where the data come from.

The video StatClips: Introduction to Statistics – Populations, Parameters, Samples and Sample Statistics looks ahead to the roles that samples and statistics will play in helping us draw conclusions about populations and parameters.