7.2 The Assumptions and Steps of Hypothesis Testing

The story of the doctor tasting tea was an informal experiment; however, the formal process of hypothesis testing is based on particular assumptions about the data. At times it might be safe to violate those assumptions and proceed through the six steps of formal hypothesis testing, but it is essential to understand the assumptions before making such a decision.

The Three Assumptions for Conducting Analyses

An assumption is a characteristic that we ideally require the population from which we are sampling to have so that we can make accurate inferences.

Think of “statistical assumptions” as the ideal conditions for hypothesis testing. More formally, assumptions are the characteristics that we ideally require the population from which we are sampling to have so that we can make accurate inferences. Why go through all the effort to understand and calculate statistics if you can’t believe the story they tell?

A parametric test is an inferential statistical analysis based on a set of assumptions about the population.

The assumptions for the z test apply to several other hypothesis tests, especially parametric tests, inferential statistical analyses based on a set of assumptions about the population. By contrast, nonparametric tests are inferential statistical analyses that are not based on a set of assumptions about the population. Learning the three main assumptions for parametric tests will help you to select the appropriate statistical test for your particular data set.

A nonparametric test is an inferential statistical analysis that is not based on a set of assumptions about the population.

Assumption 1: The dependent variable is assessed using a scale measure. If it’s clear that the dependent variable is nominal or ordinal, we could not make this first assumption and thus should not use a parametric hypothesis test.
Assumption 2: The participants are randomly selected. Every member of the population of interest must have had an equal chance of being selected for the study. This assumption is often violated; it is more likely that participants are a convenience sample. If we violate this second assumption, we must be cautious when generalizing from a sample to the population.
Assumption 3: The distribution of the population of interest must be approximately normal. Many distributions are approximately normal, but it is important to remember that there are exceptions to this guideline (Micceri, 1989). Because hypothesis tests deal with sample means rather than individual scores, as long as the sample size is at least 30 (recall the discussion about the central limit theorem), it is likely that this third assumption is met.

171

MASTERING THE CONCEPT

7.2: When we calculate a parametric statistic, ideally we have met assumptions regarding the population distribution. For a z test, there are three assumptions: The dependent variable should be on a scale measure, the sample should be randomly selected, and the underlying population should have an approximately normal distribution.

Many parametric hypothesis tests can be conducted even if some of the assumptions are not met (Table 7-2), and are robust against violations of some of these assumptions. Robust hypothesis tests are those that produce fairly accurate results even when the data suggest that the population might not meet some of the assumptions.

Table : TABLE 7-2. The Three Assumptions for Hypothesis Testing We must be aware of the assumptions for the hypothesis test that we choose, and we must be cautious in choosing to proceed with a hypothesis test when the data may not meet all of the assumptions. Note that in addition to these three assumptions, for many hypothesis tests, including the z test, the independent variable must be nominal.
The Three Assumptions Breaking the Assumptions
1. Dependent variable is on a scale measure. Usually OK if the data are not clearly nominal or ordinal.
2. Participants are randomly selected. OK if we are cautious about generalizing.
3. Population distribution is approximately normal. OK if the sample includes at least 30 scores.

A robust hypothesis test is one that produces fairly accurate results even when the data suggest that the population might not meet some of the assumptions.

These three statistical assumptions represent the ideal conditions and are more likely to produce valid research. Meeting the assumptions improves the quality of research, but not meeting the assumptions doesn’t necessarily invalidate research.

The Six Steps of Hypothesis Testing

Hypothesis testing can be broken down into six standard steps.

Step 1: Identify the populations, comparison distribution, and assumptions.

When we first approach hypothesis testing, we consider the characteristics of the data in order to determine the distribution to which we will compare the sample. First, we state the populations represented by the groups to be compared. Then we identify the comparison distribution (e.g., a distribution of means). Finally, we review the assumptions of hypothesis testing. The information we gather in this step helps us to choose the appropriate hypothesis test (Appendix Figure E-1 provides a quick guide for choosing the appropriate test).

Step 2: State the null and research hypotheses.

Hypotheses are about populations, not about samples. The null hypothesis is usually the “boring” one that posits no change or no difference between groups. The research hypothesis is usually the “exciting” one that posits, for example, that a given intervention will lead to a change or a difference—for instance, that a particular kind of psychotherapeutic intervention will reduce general anxiety. State the null and research hypotheses in both words and symbolic notation.

Step 3: Determine the characteristics of the comparison distribution.

State the relevant characteristics of the comparison distribution (the distribution based on the null hypothesis). In a later step, we will compare data from the sample (or samples) to the comparison distribution to determine how extreme the sample data are. For z tests, we will determine the mean and standard error of the comparison distribution. These numbers describe the distribution represented by the null hypothesis and will be used when we calculate the test statistic.

172

Step 4: Determine the critical values, or cutoffs.

A critical value is a test statistic value beyond which we reject the null hypothesis; often called a cutoff.

The critical values, or cutoffs, of the comparison distribution indicate how extreme the data must be, in terms of the z statistic, to reject the null hypothesis. Often called simply cutoffs, these numbers are more formally called critical values, the test statistic values beyond which we reject the null hypothesis. In most cases, we determine two cutoffs, one for extreme samples below the mean and one for extreme samples above the mean.

The critical region is the area in the tails of the comparison distribution in which the null hypothesis can be rejected.

The critical values, or cutoffs, are based on a somewhat arbitrary standard—the most extreme 5% of the comparison distribution curve: 2.5% on either end. At times, cutoffs are based on a less conservative percentage, such as 10%, or a more conservative percentage, such as 1%. Regardless of the chosen cutoff, the area beyond the cutoff, or critical value, is often referred to as the critical region. Specifically, the critical region is the area in the tails of the comparison distribution in which the null hypothesis can be rejected.

The probability used to determine the critical values, or cutoffs, in hypothesis testing is a p level; often called alpha.

These percentages are typically written as probabilities; that is, 5% would be written as 0.05. The probabilities used to determine the critical values, or cutoffs, in hypothesis testing are p levels (often called alphas).

Step 5: Calculate the test statistic.

We use the information from step 3 to calculate the test statistic, in this case the z statistic. We can then directly compare the test statistic to the critical values to determine whether the sample is extreme enough to warrant rejecting the null hypothesis.

Step 6: Make a decision.

Using the statistical evidence, we can now decide whether to reject or fail to reject the null hypothesis. Based on the available evidence, we either reject the null hypothesis if the test statistic is beyond the cutoffs, or we fail to reject the null hypothesis if the test statistic is not beyond the cutoffs.

These six steps of hypothesis testing are summarized in Table 7-3.

A finding is statistically significant if the data differ from what we would expect by chance if there were, in fact, no actual difference.

Language Alert! When we reject the null hypothesis, we often refer to the results as “statistically significant.” A finding is statistically significant if the data differ from what we would expect by chance if there were, in fact, no actual difference. The word significant is another one of those statistical terms with a very particular meaning. The phrase statistically significant does not necessarily mean that the finding is important or meaningful. A small difference between means could be statistically significant but not practically significant or important.

Table : TABLE 7-3. The Six Steps of Hypothesis Testing We use the same six basic steps with each type of hypothesis test.
1. Identify the populations, distribution, and assumptions, and then choose the appropriate hypothesis test.
2. State the null and research hypotheses, in both words and symbolic notation.
3. Determine the characteristics of the comparison distribution.
4. Determine the critical values, or cutoffs, that indicate the points beyond which we will reject the null hypothesis.
5. Calculate the test statistic.
6. Decide whether to reject or fail to reject the null hypothesis.

CHECK YOUR LEARNING

Reviewing the Concepts

  • When we conduct hypothesis testing, we have to consider the assumptions for that particular test.
  • Parametric statistics are those that are based on assumptions about the population distribution; nonparametric statistics have no such assumptions. Parametric statistics are often robust to violations of the assumptions.
  • The three assumptions for a z test are that the dependent variable is on a scale measure, the sample is randomly selected, and the underlying population distribution is approximately normal.
  • There are six standard steps for hypothesis testing. First, we identify the population, comparison distribution, and assumptions, all of which help us to choose the appropriate hypothesis test. Second, we state the null and research hypotheses. Third, we determine the characteristics of the comparison distribution. Fourth, we determine the critical values, or cutoffs, of the comparison distribution. Fifth, we calculate the test statistic. Sixth, we decide whether to reject or fail to reject the null hypothesis.
  • The standard practice of statisticians is to consider scores to be statistically significant and to warrant rejection of the null hypothesis if they occur less than 5% of the time based on the null hypothesis; observations that occur more often than 5% of the time do not support this decision, and thus we would fail to reject the null hypothesis in these cases.

173

Clarifying the Concepts

  • 7-6 Explain the three assumptions made for most parametric hypothesis tests.
  • 7-7 How do critical values help us to make a decision about the hypothesis?

Calculating the Statistics

  • 7-8 If a researcher always sets the critical region as 8% of the distribution, and the null hypothesis is true, how often will he reject the null hypothesis if the null hypothesis is true?
  • 7-9 Rewrite each of these percentages as a probability, or p level: a. 15%; b. 3%; c. 5.5%.

Applying the Concepts

  • 7-10 For each of the following scenarios, state whether each of the three basic assumptions for parametric hypothesis tests is met. Explain your answers and label the three assumptions (1) through (3).
    1. Researchers compared the ability of experienced clinical psychologists versus clinical psychology graduate students to diagnose a patient, based on a 1-hour interview. For 2 months, either a psychologist or a student interviewed every outpatient at the local community mental health center who had already received diagnoses based on a number of criteria. For each diagnosis, the psychologists and graduate students were given a score of correct or incorrect.
    2. Behavioral scientists wondered whether animals raised in captivity would be healthier with diminished human contact. Twenty large cats (e.g., lions, tigers) were randomly selected from all the wild cats living in zoos in North America. Half were assigned to the control group—no change in human interaction. Half were assigned to the experimental group—no humans entered their cages except when the animals were not in them, one-way mirrors were used so that the animals could not see zoo visitors, and so on. The animals received a score for health over 1 year; points were given for various illnesses; a very few sickly animals had extremely high scores.

Solutions to these Check Your Learning questions can be found in Appendix D.