9.1 Introduction to Hypothesis Testing

488

OBJECTIVES By the end of this section, I will be able to …

  1. Construct the null hypothesis and the alternative hypothesis from the statement of the problem.
  2. State the two types of errors made in hypothesis tests: the Type I error, made with probability , and the Type II error, made with probability .

Researchers are interested in investigating many different types of questions, such as the following:

Questions such as these can be tackled using statistical hypothesis testing, which is a statistical inference process for using sample data to render a decision about claims regarding the unknown value of a population parameter. In this section, we will learn how to make decisions about the values of a population mean.

1 Constructing the Hypotheses

Let's start with an example.

EXAMPLE 1 Are these dice loaded?

Suppose you are playing a dice game, where you roll a pair of dice and win the sum of the two dice in dollars. A fair price to pay to play this game is $7 a throw, because the long-run mean when tossing two fair dice is 7. Now suppose you have played this game 10 times (paying a total of $70), with the following 10 results from throwing the two dice:

4 6 2 7 8 3 5 4 9 2

These 10 dice rolls add up to 50, meaning that, for your outlay of $70, you have only received $50 in return. You wonder:

  • Are these dice fair but you have just had a streak of bad luck, or
  • Are these dice not fair, that is, loaded (weighted) to provide low outcomes?

This is a basic example of hypothesis testing, where we have two competing ideas, and we turn to observed data (the dice rolls) to provide evidence in favor of one idea or the other.

We examine this question in more detail in the exercises and again in Section 9.2.

So, what is a hypothesis?

A hypothesis is a statement made about the value of a parameter. (A parameter is a characteristic of a population, such as the population mean .)

489

Examples of hypotheses might be the following:

  1. The population mean of the dice tossed in Example 1 equals 7, meaning that you just had a run of bad luck.
  2. The population mean of the dice tossed in Example 1 is less than 7, meaning that the dice were loaded.
  3. The population proportion of adults owning a tablet computer in 2014 was 42%. A media technology researcher states that this proportion is still the same today.
  4. A different researcher states that the population proportion of adults owning a tablet computer has increased since 2014.

Note: A hypothesis is not necessarily true. It is simply a statement. We need to look to the data for evidence either for it or against it.

Note that the statements in (a) and (b) are competing ideas, which can't both be right. Similarly, the statements in (c) and (d) are competing ideas.

The problem is that the value of the parameter is unknown, because it is a characteristic of a population, and we do not have access to the entire population. For example, we do not know the proportion of all people in the world today who own tablets, because new people are buying them all the time. If the true value of the parameter was known, there would be no need to perform a hypothesis test about it. This is why two reasonable people can have different ideas about the value of a population parameter. We must leave it up to the observed (sample) data to provide evidence in favor of a particular hypothesis.

To summarize, we have the following definition of hypothesis testing.

Hypothesis testing is a procedure for:

  1. stating two competing hypotheses about the unknown value of a population parameter, such as the population mean ,
  2. analyzing the evidence collected from sample data, and
  3. rendering a decision about which hypothesis the sample data support.

The two competing statements about the parameter are called the null hypothesis and alternative hypothesis, and they are described below.

The Hypotheses

  • The null hypothesis represents what has been tentatively assumed about the value of the parameter. Thus, it represents no change, no effect, or no difference. The null hypothesis is denoted as (pronounced “H-naught”), and it is assumed true unless the sample data provide evidence against it.
  • The alternative hypothesis, or research hypothesis, denoted as , represents an alternative claim about the value of the parameter. If the alternative hypothesis is to be chosen over the null hypothesis, it requires sample evidence in its favor.

Hypothesis testing is like conducting a criminal trial. In a trial in the United States, the defendant is innocent until proven guilty, and the jury must evaluate the truth of two competing hypotheses:

The not-guilty hypothesis is considered the null hypothesis because the jurors must assume it is true until proven otherwise. The alternative hypothesis , that the defendant is guilty, must be demonstrated to be true, beyond a reasonable doubt. How does a court of law determine whether the defendant is convicted or acquitted? This judgment is based upon the evidence, the hard facts heard in court. Similarly, in hypothesis testing, the researcher draws a conclusion based on the evidence provided by the sample data.

In Sections 9.19.4, we will examine hypotheses for the unknown mean . The null hypothesis will be a claim about a certain specified value for denoted , and the alternative hypothesis will be a claim about other values for . The hypotheses have one of the three possible forms shown in Table 1. The right-tailed test and the left-tailed test are called one-tailed tests. In Section 9.2, we will find out why we use this terminology. All of the hypothesis tests we perform in Sections 9.19.4 will take one of the three forms in Table 1.

490

Table 9.2: Table 1 The three possible forms for the hypotheses for a test for
Form Null and alternative hypotheses
Right-tailed test
Left-tailed test
Two-tailed test

The notation looks scary, but it just refers to the hypothesized value of . The following example will help to clarify.

EXAMPLE 2 An example of hypotheses

Recall the dice-throwing situation in Example 1. Suppose we wanted to conduct a hypothesis test to test whether the population mean of the dice tossed in Example 1 is less than 7. State the hypotheses.

Solution

We want to test whether is less than 7. The only place a “less than” sign appears in Table 1 is for the left-tailed test:

Next, we ask ourselves, “less than what?” The answer to this question, 7, is the value of . That is, . The same value goes in both . Thus, we have our hypotheses:

Note that we do not perform hypothesis tests about sample characteristics, because we already know the values of these sample statistics. For example, we would never have hypotheses of the form: versus , because, for any given sample, the value of is known, so there is no need to perform a hypothesis test about it.

EXAMPLE 3 Identifying valid and invalid hypotheses

Determine whether the following hypotheses are in a valid form or not. If not, explain why not, and put the hypotheses in a valid form.

Solution

  1. Invalid. The equal sign always goes in . So the correct form is:

    491

  2. Invalid. Statistics such as never appear in the hypotheses (though they are used later on in the hypothesis test procedure) because we know their values. Hypotheses are about parameters such as , whose value is unknown. The correct form is:

  3. Invalid. The same value for should go in both and . One possible correct form is:

  4. This form for the hypotheses is valid.

NOW YOU CAN DO

Exercises 9–12.

The first task in hypothesis testing is to form hypotheses. To convert a word problem into two hypotheses, look for certain key words that can be expressed mathematically. Table 2 shows how to convert words typically found in word problems into symbols.

Table 9.3: Table 2 Key English words, with mathematical symbols and synonyms
English words Symbol Synonyms
Equal = Is; has stayed the same
Not equal Is different from; has changed from; differs from
Greater than > Is more than; exceeds; has increased
Less than < Is below; is smaller than; has decreased

Once you have identified the key words, use the associated mathematical symbol to write the two hypotheses. The following strategy can be used to write the hypotheses.

image Do not blindly apply this strategy without thinking about what you are doing. Instead, use the strategy to help formulate your own hypotheses. There is no substitute for thinking through the problem!

Strategy for Constructing the Hypotheses About

  • Step 1 Search the word problem for certain key English words and select the associated symbol from Table 2.
  • Step 2 Determine the form of the hypotheses listed in Table 1 that uses this symbol.
  • Step 3 Find the value of (the number that answers the question: “greater than what?” or “less than what?”) and write your hypotheses in the appropriate forms.

EXAMPLE 4 Applying the strategy for constructing the hypotheses about

The mean annual rainfall in Arizona has been eight inches per year, according to the World Almanac. But weather researchers are interested in whether this already small amount of rain will decrease, leading to drought conditions in the state. Use the steps in the Strategy for Constructing the Hypotheses About to write a null hypothesis and an alternative hypothesis for this situation.

Solution

Let's use our strategy to construct the hypotheses needed to test this claim.

  • Step 1 Search the word problem for certain key English words and select the appropriate symbol.

    The problem uses the word “decrease,” which means “is less than.” Thus, we will write a hypothesis that contains the < symbol.

    492

  • Step 2 Determine the form of the hypotheses.

    From Table 1, we see that the symbol < means that we use a left-tailed test:

  • Step 3 Find the value for and write your hypotheses.

    The alternative hypothesis states that the mean annual rainfall in Arizona is less than some value . Less than what? Eight inches per year. Write the two hypotheses with .

NOW YOU CAN DO

Exercises 13–18.

YOUR TURN #1

Use Steps 1–3 in the Strategy for Constructing Hypotheses About to construct the hypotheses for the following scenario: Nielsen reports that iPhone and Android users spent 30 hours a month using apps on their devices in 2013. A media technology analyst states that the mean amount of time has increased since 2013. Write a null hypothesis and an alternative hypothesis for this situation.

(The solution is shown in Appendix A.)

Now that we know how to construct hypotheses, we next consider when sufficient evidence exists to reject the null hypothesis.

Statistical Significance

A result is said to be statistically significant if it is unlikely to have occurred due to chance.

EXAMPLE 5 Statistical significance

Suppose that you are a researcher for a pharmaceutical research company. You are investigating the side effects of a new cholesterol-lowering medication and want to determine whether the medication will decrease the population mean systolic blood pressure level from the current population mean of . If so, then a warning will have to be given not to prescribe the new medication to patients whose blood pressure is already low.

To determine which of these hypotheses is correct, we take a sample of randomly selected patients who are taking the medication. We record their systolic blood pressure levels and calculate the sample mean and sample standard deviation s. Most likely, the mean of this sample of patients' systolic blood pressure levels will not be exactly equal to 110, even if the null hypothesis is true. Now, suppose that the sample mean blood pressure is less than the hypothesized population mean of 110. Is the difference due simply to chance variation, or is it evidence of a real side effect of the cholesterol medication?

  1. Construct the appropriate hypotheses.
  2. For and , discuss whether each result would be statistically significant or due to chance.

Solution

  1. The key word “decrease” means we have a left-tailed test. “Less than what?” The current population mean systolic blood pressure of . Thus, our hypotheses are:

    where represents the population mean systolic blood pressure and .

    493

  2. For , the difference between and is only 1. Depending on the variability present in the sample, the researcher would likely not reject the null hypothesis because this small difference is probably due to chance variation. The result is probably not statistically significant. But, for , the difference between and is 20. Depending on the variability present in the sample, the researcher would probably conclude that this difference is so large that it is unlikely that it is due to chance variation. Thus, the researcher would probably reject the null hypothesis in favor of the alternative hypothesis . The result is statistically significant.

Note: When we reject , we say that the results are statistically significant. If we do not reject , the results are not statistically significant.

To summarize:

The question is, “Where do you draw the line?” Just how large a difference between and is large enough to reject the null hypothesis? We answer this question starting in Section 9.2.

Note that there are only two possible hypothesis-testing conclusions:

Developing Your Statistical Sense

A Decision Is Not Proof

It is important to understand that the decision to reject or not reject does not prove anything. The decision represents whether or not there is sufficient evidence against the null hypothesis. This is our best judgment, given the available data, similar to the best judgment of a jury, given the available evidence. You cannot claim to have proven anything about the value of a population parameter unless you elicit information from the entire population, which is usually not possible.

We can make decisions about population parameters using the limited information available in a sample because we base our decisions on probability. When the difference between the sample mean and the hypothesized population mean is large, then the null hypothesis is probably not correct. When the difference is small, then the data are probably consistent with the null hypothesis. But we don't know for sure.

2 Type I and Type II Errors

Next, we take a closer look at some of the thorny issues involved in performing a hypothesis test. Let's return to the example of a criminal trial. The jury will convict the defendant if they find evidence compelling enough to reject the null hypothesis of “not guilty” beyond a reasonable doubt. However, jurors are only human; sometimes their decisions are correct and sometimes they are not. Thus, the jury's verdict will represent one of the following outcomes:

  1. An innocent defendant is wrongfully convicted.
  2. A guilty defendant is convicted.
  3. A guilty defendant is wrongfully acquitted.
  4. An innocent defendant is acquitted.

494

Recall that we can write the two hypotheses for a criminal trial as

Table 3 shows the possible verdicts on the left and the two hypotheses across the top.

Table 9.4: Table 3 Four possible outcomes of a criminal trial
image
image
A jury's decision can be correct or incorrect. The same is true for the conclusion of a hypothesis test.

Let's look at the two possible decisions the jury can make. It can find the defendant guilty: the jury rejects the claim in the null hypothesis . Alternatively, the jury can find the defendant not guilty: the jury does not reject the null hypothesis . The jury can render the correct decision in two ways.

Two Ways of Making the Correct Decision

  • To not reject when is true.

    Example: To find the defendant not guilty when, in reality, he did not commit the crime.

  • To reject when is false.

    Example: To find the defendant guilty when, in reality, he did commit the crime.

Unfortunately, the jury can also render an incorrect decision in two ways. In statistics, the two incorrect decisions are called Type I and Type II errors.

Two Types of Errors

  • Type I error: To reject when is true.

    Example: To find the defendant guilty when, in reality, he did not commit the crime.

  • Type II error: To not reject when is false.

    Example: To find the defendant not guilty when, in reality, he did commit the crime.

EXAMPLE 6 Type I and Type II errors

For the medication hypothesis test in Example 5, explain what it would mean if the following errors were made:

  1. Type I error
  2. Type II error

495

Solution

The hypotheses in Example 5 were the following:

where represents the population mean systolic blood pressure.

  1. A Type I error occurs when we reject when is true. This would be to conclude that had decreased when, in reality, it had stayed the same. In other words, a Type I error would be to conclude that the population mean systolic blood pressure had decreased when, in reality, it had not decreased. The pharmaceutical company, afraid of this possible side effect, might not continue production of the drug when, in reality, there is no side effect.
  2. A Type II error occurs when we do not reject when is false. This would be to conclude that had stayed the same when, in reality, it had decreased. In this case, this is a very dangerous error to make, because the pharmaceutical company might then conclude that the side effect does not exist when, in reality, it does exist, and it could lead to dangerous lowering of blood pressure. This is why the Food and Drug Administration requires that strict protocols are followed regarding Type I and Type II errors when approving new medications for the market.

NOW YOU CAN DO

Exercises 19–22.

YOUR TURN #2

Explain what it would mean to make a Type I error and a Type II error for the hypothesis test in the following examples:

  1. Example 2
  2. Example 4

(The solutions are shown in Appendix A.)

The probability of a Type I error is denoted as (alpha). We set the value of to be some small constant, such as 0.01, 0.05, or 0.10, so that only a small probability of rejecting a true null hypothesis exists.

To say that means that, if this hypothesis test were repeated over and over again, the long-term probability of rejecting a true null hypothesis would be 5%. The level of significance of a hypothesis test is another name for , the probability of rejecting when is true. A smaller makes it harder to wrongfully reject just by chance. If the consequences of making a Type I error are serious, then the level of significance should be small, such as . If the consequences of making a Type I error are not so serious, then one may choose a larger value for the level of significance, such as or .

The probability of a Type II error is denoted as (beta). This is the probability of not rejecting when is false, such as acquitting someone who is really guilty. Making smaller inevitably makes larger (for a fixed sample size). Of course, our goal is to simultaneously minimize both and . Unfortunately, the only way to do this is to increase the sample size.