Chapter 2 Review

In this chapter we looked at how to obtain data from either an observational study or an experiment. Our focus was on study design, randomization and sample selection, and drawing conclusions. Recall that with an observational study, members of the population are merely observed while with an experiment, there is some type of treatment imposed on the subjects.

Watch the whiteboard example StatClips: Data and Sampling Example C to see how to select a random sample from a table.

Data and Sampling Example C

With an observational study we are interested in the survey design. How was the data collected - via a survey, an interview, or by observation? We also would like to know how the sample was selected. Was randomization part of the process? Recall that randomization is the use of impersonal chance to select individuals to study or to assign individuals to treatment groups. If the subjects are not randomly selected, bias may be introduced. Bias is the predisposition of a study toward certain results; it is caused by problems in the study design or sample selection. If we want to obtain a sample that best represents the population, randomization gives us the best chance of accomplishing this goal. Selecting a sample which best represents the population begins with identifying the sampling frame, which is a list of the individuals in the population.

Even if randomization is part of the study, other forms of bias are possible. Response bias is any aspect of the design that influences the participants’ response. Another form of bias is undercoverage, which is underrepresentation in the sample of a group or groups of individuals in the population. Nonresponse bias occurs when members selected for the sample do not actually participate, and finally, voluntary response occurs when individuals choose to answer a particular question.

In reporting the results of a poll or survey it’s important to specify the summary statistics, sample sizes, and margin of error. The margin of error is a number that estimates (with all likelihood) how far the desired parameter could be from the reported statistic.

In many studies there will be at least one explanatory variable and response variable. An explanatory variable is the variable that explains or predicts the values of the response variable whereas the response variable is the outcome that we are investigating. Although an observational study can provide useful information about the association between the explanatory and response variables, it cannot provide cause and effect between them because there may be other variables (lurking variables) at play. A lurking variable is a characteristic of the sample that is not investigated as part of the study, but which may influence the results. Confounding is the inability to distinguish the effects of the explanatory and lurking variables on the response variable.

If you are interested in establishing cause and effect then you need to perform an experiment. Often experiments involve a placebo, which is an inactive treatment that has no medical effect. The placebo serves as the control treatment, a treatment that is used to establish cause and effect. A well-designed experiment should involve three critical properties: control, randomization, and replication. In addition, often an experiment is double-blind. A double-blind study is one in which neither the individuals receiving the treatment nor those administering the treatments (or interpreting them) know which group of individuals are in the treatment and control groups.

The simplest experimental design involves randomly assigning the subjects to treatment groups and then comparing the values of the response variables. More sophisticated designs, including block design and matched pairs, are often utilized. A block is a group of individuals that share one or more characteristics. With a block design, individuals are separated according to some characteristic, before each block is divided into treatment groups. Matched-pair design is used to control for variables not studied. Two individuals matched according to important characteristics are paired, with one individual assigned to the first treatment and the other to the second treatment.

If after the experiment is conducted there is a difference among the treatment groups so large that they are unlikely to merely have occurred by chance, we say that the results are statistically significant.

Finally, there are ethical considerations that need to be addressed in a clinical trial, a medical study on human subjects. One important component of a clinical trial is informed consent, which is the process by which details of the study are explained to a potential participant.

In both observational studies and experiments, we have seen that randomization is a key component. So how does one go about generating a random sample? A sample selected such that every individual has an equal chance of being selected and allows every possible combination of a particular fixed size to be selected, is called a simple random sample (SRS). More complicated sample designs also exist and two particular examples are a stratified random sample and a cluster sample. A stratified random sample is obtained by first dividing the population into groups (strata) defined by one or more variables, and then selecting a simple random sample from each group. Unlike a stratified random sample, in which the groups are different, a cluster sample first divides the population into groups (clusters) that are similar. A set of clusters is then randomly chosen, and a census of the individuals in each selected cluster is conducted.