7.2 7.1 Sampling

In all these cases, we want to gather information about a large group of individuals. Time, cost, and inconvenience preclude contacting every person, so we gather information about only part of the group in order to draw conclusions about the whole. In some cases when gathering data, the subject is destroyed. For example, when Frito Lay checks the salt level of the potato chips it produces, it tests only a small sample of chips so that the company still has product to sell. And if your doctor’s appointment includes a blood test, you want only some of your blood removed! In these situations, it is obviously necessary to use only a sample.

Population DEFINITION

The population in a statistical study is the entire group of individuals about which we want information.

Sample DEFINITION

A sample is a part of the population from which we actually collect information that is used to draw conclusions about the whole. Sampling refers to the process of choosing a sample from the population.

For an example of a population and sample, let’s refer back to the first bullet listed above—a political scientist wants to know what percentage of college students consider themselves conservatives. Here, the population would be all college students (or perhaps just the millions of college students in the United States), and the sample would be the small subset of students (typically between 500 and 1000) actually selected to participate in the sample.

293

We often draw conclusions about a whole on the basis of a sample. Everyone has sipped a spoonful of soup and judged the entire bowl on the basis of that taste. But a bowl of soup is homogeneous, so the taste of a single spoonful does represent the whole. On the other hand, a spoonful of salad dressing may be misleading because its elements may separate if the bottle has not been shaken recently. A spoonful taken from the top might be mostly oil.

image

Choosing a representative sample from a large and varied population can be difficult. The first step in a proper sample survey is to state carefully just what population we want to describe. The second step is to define exactly what we want to measure. These preliminary steps can be complicated, as the following example illustrates.

EXAMPLE 1 How Can a Survey Measure Unemployment?

The monthly unemployment rate comes from the government’s Current Population Survey (CPS; www.census.gov/cps/), which involves a sample of about 50,000 households each month. To measure unemployment, we must first specify the population that we want to describe:

  • Which age groups will we include?
  • Will we include illegal immigrants or people in prisons?
  • Should we include military personnel?

The CPS defines its population as all U.S. residents (whether citizens or not), 16 years of age and over, who are civilians and not in an institution such as a prison. the civilian unemployment rate announced in the news refers to this specific population.

The second question is more difficult: What does the term unemployed mean? Someone who is not looking for work—for example, a full-time student—should not be called unemployed just because he or she is not working for pay. If you are chosen for the CPS sample, the interviewer first asks whether you are available to work and whether you actually looked for work in the past four weeks. if not, you are neither employed nor unemployed; you are not in the labor force.

294

If you are in the labor force, the interviewer goes on to ask about employment. Any work for pay that you performed the week of the survey, whether for someone else or in your own business, qualifies you to be counted as employed. So does at least 15 hours of unpaid work in a family business. in addition, you are considered employed if you have a job but didn’t work for reasons such as being on vacation or on strike.

So, an unemployment rate of 6.7% means that 6.7% of the sample was unemployed, using the exact CPS definitions of both labor force and unemployed.

image
Figure 7.1: Figure 7.1 Line graph showing the monthly unemployment rates calculated from CPS data for the first seven months of 2014.

Self Check 1

A local television station conducts nightly polls of public opinion by announcing a question on the 6 o’clock news and asking viewers to call in or text their response of “yes” or “no” to the station. The results are announced on the 11 o’clock news later the same night. One such poll finds that 76% of those who called in or texted are opposed to a proposed local gun control ordinance.

  1. What do you think the population is in this situation?

    • (a) Reasonable populations are all residents in the station’s viewing area or all viewers of the 6 o’clock news. However, certain viewers who feel strongly about this question could call their friends and encourage them to vote even if they are outside of the viewing area. So, it could be all residents in the station’s viewing area and their friends. (There is no clear answer to this question.)

      (b) Sample answer: If the population is all residents in the station’s viewing area, the sample is not representative of the population. For example, people who do not watch the 6 o’clock news will not be represented in this sample. The views of the 6 o’clock news watchers could be quite different from those who do not watch the 6 o’clock news. Also, people who feel strongly about the question are more apt to reply and will thus be overrepresented in the sample.

  2. Do you believe this sample is representative of the population? Explain.

    • (a) The sample consists of the people who completed the survey and mailed it back to the hospital. This is an example of a voluntary response survey.

      (b) The population is all people having routine medical tests done at this hospital.

      (c) The data will not accurately portray patient satisfaction. Many patients will not bother to fill out the survey, particularly if they did not encounter any problems. People who had a bad experience of some type are probably more likely to complete and mail in the survey.