6.5Applications of the Normal Distribution

368

OBJECTIVES By the end of this section, I will be able to …

  1. Compute probabilities for a given value of any normal random variable.
  2. Find the appropriate value of any normal random variable, given an area or probability.
  3. Use normal probability plots to assess normality.

1Finding Probabilities for Any Normal Distribution

The data in problems that we face in the real world do not usually follow the standard normal distribution, . Instead, a problem may be stated in terms of some normal random variable that has a mean other than 0 or a standard deviation other than 1. In cases like these, needs to be standardized to so that we can use the Section 6.4 techniques.

To standardize things means to make them all the same. For example, college applicants take standardized tests so that the admissions officers can compare students according to a consistent assessment tool. Here, we standardize many different normal random variables into the same standard normal .

Standardizing to

To standardize a normal random variable , we transform that normal random variable into the standard normal random variable .

Suppose that is a normal random variable with population mean and population standard deviation . We standardize by subtracting the mean and dividing by the standard deviation . The result of this transformation is the familiar standard normal random variable .

Standardizing a Normal Random Variable

Any normal random variable can be transformed into the standard normal random variable by standardizing with the formula

The key here is the following: for a given area of interest for a normal random variable , the corresponding area after the transformation to is exactly the same. For any normal random variable

the area between and

is exactly the same as

the area between and (see Figure 46)

So we can solve problems about areas under the nonstandard normal curve by using the corresponding area under the curve.

image
Figure 6.48: FIGURE 46 Corresponding areas are equal.

369

EXAMPLE 37April in Georgia

image
April in Georgia.

The state of Georgia reports that the mean temperature statewide for the month of April is . Assume that the standard deviation is and that temperature in Georgia in April is normally distributed. Draw the normal curve for temperatures between 45.5°F and 77.5°F, and the corresponding curve. Find the probability that the temperature is between 45.5°F and 77.5°F in April in Georgia.

Solution

Here, we have and , giving us

In Figure 47, the area between and is the same as between and . In other words,

This is a Case 3 problem from Table 8 (page 355). The table tells us that the area to the left of is 0.0228, and the area to the left of is 0.9772. The area between −2 and 2 is then equal to . The probability that the temperature is between 45.5°F and 77.5°F in April in Georgia is 0.9544.

image
Figure 6.49: FIGURE 47 Find the area under the curve and we have found the area under the curve.

Finding Probabilities for Any Normal Distribution

  • Step 1 Determine the random variable , the mean , and the standard deviation . Draw the normal curve for , and shade the desired area.
  • Step 2 Standardize by using the formula to find the values of corresponding to the -values.
  • Step 3 Draw the standard normal curve and shade the area corresponding to the shaded area in the graph of .
  • Step 4 Find the area under the standard normal curve using either the table or technology. This area is equal to the area under the normal curve for drawn in Step 1.

image Check Your Answer! According to the Empirical Rule, almost all -values lie between −3 and 3, so it is unlikely that a randomly selected value of lies outside this range. You should remember this when you are doing your calcu lations. If you are standardizing a normal random variable and get a very large -value (such as ), you should recheck your calculations because the probability that takes such a large value is very small.

370

EXAMPLE 38Finding probability for a normal random variable

image

image SAT Scores and AP Exam Scores

The College Board reports that the population mean Math SAT score in 2013 was = 514, with a population standard deviation of , and that the scores follow a normal distribution. Suppose that a local college wants to identify at-risk math students, which it considers to be students scoring below 396 on the Math SAT Find the proportion of students who score below 396 on the Math SAT.

Remember that you may solve problems asking for proportions or percentages by finding the appropriate probability.

Solution

  • Step 1 Determine , , and .

    We are given that the normal random variable = Math SAT score has mean and standard deviation . In the center of the number line, mark the mean . Also mark on the number line the value of about which the problem is asking. Figure 48 shows the graph of (the Math SAT scores) with the mean of 514 and the score of 396 marked.

    You need to know the proportion of scores below 396, so shade the area under the curve to the left of 396. We can express this proportion as a probability, the probability that a randomly chosen student will score less than 396, or . Just by looking at Figure 48, you should be able to get a rough idea of what the proportion of these scores will be. Certainly, this proportion will be less than 50%. If you get an answer such as “60%” for your proportion, you should recognize that it is wrong.

    image
    Figure 6.50: FIGURE 48 Graph of proportion of Math SAT scores lower than 396.
  • Step 2 Standardize.

    Now standardize the random variable to the standard normal :

    Find the -value corresponding to the Math SAT score of 396:

    So the -value associated with a score of 396 is −1, which indicates that the score of 396 is 1 standard deviation below the mean of 514.

    image
    Figure 6.51: FIGURE 49 Graph of .
  • Step 3 Draw the standard normal curve.

    Scores less than 396 are more than 1 standard deviation below the mean, so shade the area to the left of −1 in Figure 49. Now find the area to the left of using the methods of Section 6.4.

    Step 4 Find the area under the standard normal curve.

    The table tells us that the area to the left of is 0.1587.

    The proportion of scores below 396 is 0.1587, or 15.87%. Note that this value for agrees with our earlier intuition that the proportion was less than 50%.

NOW YOU CAN DO

Exercises 3–9.

YOUR TURN #19

For the scenario in Example 38, find the proportion of Math SAT scores greater than 600.

(The solution is shown in Appendix A.)

371

EXAMPLE 39Finding the probability that lies between two given values

image SAT Scores and AP Exam Scores

Continuing the Math SAT score problem, what percentage of students score between 215 and 595?

The Normal Density Curve applet allows you to find areas associated with various values of any normal random variable.

Solution

  • Step 1 Determine , , and .

    We have already seen that has mean and standard deviation . Once again, draw a graph of the distribution of scores , with the mean 514 in the middle, the score 215 to the left of the mean, and the score 595 to the right of the mean, as in Figure 50.

  • Step 2 Standardize.

    This is a “between” example, where two values of are given, and we are asked to find the area between them. In this case, just standardize both of these values of to get a -value for each:

    image
    Figure 6.52: FIGURE 50 Graph of percentage of students scoring between 215 and 595 on the Math SAT.
  • Step 3 Draw the standard normal curve.

    Draw a graph of , shading the area between and , as shown in Figure 51. Again, the key is that the area between and is exactly the same as the area between and .

  • Step 4 Find area under the standard normal curve.

    Figure 51 is a Case 3 problem from Table 8 (page 355). Find the area to the left of 0.69, which is 0.7549, and the area to the left of −2.53, which is 0.0057. Subtracting the smaller from the larger gives us

    Thus, the percentage of Math SAT scores that are between 215 and 595 is 74.92%.

    image
    Figure 6.53: FIGURE 51 Graph of percentage of - values between −2.53 and 0.69.

NOW YOU CAN DO

Exercises 10–14.

372

YOUR TURN #20

For the scenario in Example 39, find the proportion of Math SAT scores between 305 and 605.

(The solution is shown in Appendix A.)

2Finding a Normal Data Value for a Given Area or Probability

Sometimes we are given a probability (or proportion or area), and we are asked to find the associated value of . Questions like these are similar to the “backwards” problems of Section 6.4, which are so called because we must use the table backward or inside out. The formula for standardizing gives the value for , so we need to use our algebra skills to find the equation for : Start with the standard normal formula . Multiply both sides by to get . Then add to both sides, giving us .

Finding Normal Data Values for a Given Area or Probability

  • Step 1 Determine , , and , and draw the normal curve for . Shade the desired area. Mark the position of , the unknown value of .
  • Step 2 Find the -value corresponding to the desired area. Look up the area you identified in Step 1 on the inside of the table. If you do not find the exact value of your area, by convention choose the area that is closest.
  • Step 3 Transform this value of into a value of , which is the solution. Use the formula .

EXAMPLE 40Finding a normal data value for a given area

image SAT Scores and AP Exam Scores

Suppose the students in the top 1% of Math SAT scores won a fellowship to an Ivy League university. What is the score that students will have to obtain to win this fellowship?

Solution

Notice that we are not asked to find a probability (or proportion or area). Instead, we are given a percentage (1%) and asked to find the value of (the Math SAT score) that is associated with this 1%.

  • Step 1 Determine , , and , and draw the normal curve for .

    We already know that , , and . The value of in which we are interested refers to high scores, so that will be at the far right of the distribution of . Only 1% of scores will be greater than this score, so the area to the right of is 0.01, as shown in Figure 52.

    image
    Figure 6.54: FIGURE 52 is the cutoff value (or critical value) of , at which students will win a fellowship to an Ivy League university.

    373

  • Step 2 Find the -value corresponding to the desired area.

    The area to the right of equals 0.01, so that the area to the left of equals . Looking up 0.99 on the inside of the table gives us .

  • Step 3 Transform using the formula .

    We calculate

The cutoff value for the top 1% of Math SAT scores for winning a fellowship to an Ivy League university is 788.94. It won't be easy getting that fellowship.

NOW YOU CAN DO

Exercises 15–22.

YOUR TURN #21

For the situation in Example 40, what is the Math SAT score that separates the lowest 2.5% of the scores from the others?

(The solution is shown in Appendix A.)

EXAMPLE 41Finding the -values that mark the boundaries of the middle 95% of -values

Edmunds.com reported that the average amount that people were paying for a 2015 Toyota Camry XLE was $28,720. Let , and assume that price follows a normal distribution with and . Find the prices that separate the middle 95% of 2015 Toyota Camry XLE prices from the bottom 2.5% and the top 2.5%.

Solution

  • Step 1 Determine , , and , and draw the normal curve for .

    Let , , and . The middle 95% of prices are between and , as shown in Figure 53.

  • Step 2 Find the -values corresponding to the desired area.

    The area to the left of equals 0.025, and the area to the left of equals 0.975. Looking up area 0.025 on the inside of the table gives us . Looking up area 0.975 on the inside of the table gives us .

    image
    Figure 6.55: FIGURE 53 and mark the middle 95% of Camry XLE prices.
  • Step 3 Transform using the formula .

    We calculate

    The prices that separate the middle 95% of 2015 Toyota Camry XLE prices from the bottom 2.5% of prices and the top 2.5% of prices are $26,760 and $30,680.

NOW YOU CAN DO

Exercises 23–26.

374

YOUR TURN #22

For the situation in Example 41, find the two prices that separate the middle 90% of prices from the bottom 5% and the top 5%.

(The solution is shown in Appendix A.)

image What If Scenario: How Change in Spread Affects Camry Prices

In Example 41, what if we ask the same question again, but this time the standard deviation of 2015 Toyota Camry XLE prices is not $1000, but some value less than $1000? How and why would this affect the following?

  1. The values and found in Step 2
  2. The value separating the middle 95% of prices from the bottom 2.5%
  3. The value separating the middle 95% of prices from the top 2.5%

Solution

Figure 54 illustrates the distribution of 2015 Toyota Camry XLE prices, where everything is the same as in Figure 53, except that the standard deviation of the prices is smaller by an unknown amount. Thus, the spread of the distribution is smaller.

image
Figure 6.56: FIGURE 54 The middle 95% of prices now has less spread, bringing each of and closer to the mean.
  1. We are still asking for the middle 95% of prices, so the -values remain the same: −1.96 and 1.96.
  2. Re-express the formula as . If is smaller than $1000, then the quantity 1.96 · , which represents the difference between the mean price and , will also be smaller.

    Because is less than the mean , the smaller difference between the mean price and leads us to conclude that will be larger than in Example 41. For example, if the new standard deviation is , then , which is larger than the $26,760 in Example 41.

  3. Similarly, a smaller means a smaller quantity , which means that will be closer to the mean . Because is larger than the mean, the new value for will be smaller than in Example 41.

375

EXAMPLE 42Normal probabilities and percentiles using technology

Applying the information on Toyota Camry prices from Example 41, use the TI-83/84, Excel, Minitab, or JMP to find the following:

  1. The proportion of 2015 Camry XLEs costing between $27,000 and $30,000:
  2. The 99th percentile of Camry XLE prices; that is, find the value of , namely, , such that

Solution

The instructions for finding these quantities are given in the Step-by-Step Technology Guide at the end of this section (page 380).

TI-83/84

  1. Figure 55 shows that .
  2. Figure 56 shows that the value for , such that , is given by .
    image
    Figure 6.57: FIGURE 55 TI-83/84: Finding a probability.
    image
    Figure 6.58: FIGURE 56 TI-83/84: Finding a value of .

Excel

  1. Excel provides the cumulative probabilities in Figure 57 and in Figure 58. To find , we subtract from :
    image
    Figure 6.59: FIGURE 57 Excel: .

    376

    image
    Figure 6.60: FIGURE 58 Excel: .
  2. Excel provides the result shown in Figure 59:
    image
    Figure 6.61: FIGURE 59 Excel: Finding a value of .

Minitab

  1. Similar to Excel, Minitab asks you to take the difference of two cumulative probabilities: in Figure 60 and in Figure 61:
    image
    Figure 6.62: FIGURE 60 Minitab: .
    image
    Figure 6.63: FIGURE 61 Minitab: .
  2. The results are given in Figure 62:
    image
    Figure 6.64: FIGURE 62 Minitab: Finding a value of .

377

JMP

  1. JMP also asks you to take the difference of two cumulative probabilities: in Figure 63 and in Figure 64:
    image
    Figure 6.65: FIGURE 63 JMP: .
    image
    Figure 6.66: FIGURE 64 JMP: .
  2. The results are given in Figure 65:
    image
    Figure 6.67: FIGURE 65 JMP: Finding a value of .

378

Developing Your Statistical Sense

Text Messaging: Be Careful What You Assume

The Pew Internet and American Life Project reported in 2011 that the mean number of text messages sent per day by 18- to 24-year-old Americans is 109.5. Assume that the distribution of the number of text messages is normal, with and standard deviation .

Problem 1. Suppose that cell phone customers get a special rate if the number of text messages they send per day is at or above the 95th percentile. Find the number of text messages represented by the 95th percentile.

Solution to Problem 1. On the assumption that the number of text messages is normally distributed, and working similarly to Example 42b, we find the 95th percentile of text messages to be about 167, as shown in Figure 66.

image
Figure 6.68: FIGURE 66 95th percentile of text messages.

Problem 2. Pew reports further that the median number of text messages sent per day by 18- to 24-year-old Americans is 50.

  1. What does this say about our assumption of normality for the distribution of text messages?
  2. What shape does the distribution of the number of text messages actually take?
  3. Is the actual 95th percentile of text messages greater or less than 167, and why?

Solution to Problem 2.

  1. In Chapter 3, we learned that, for symmetric distributions (such as the normal distribution), the mean and the median were about equal (see Figure 5 on page 115). The mean number of text message 109.5 is much larger than the median of 50 text messages, so the distribution of text messages is not symmetric and thus cannot be normal.
  2. The number of text messages takes a shape like Figure 33 on page 73. Thus, the distribution of the number of text messages is actually right-skewed.
  3. Figure 67 shows the (wrongly) assumed normal distribution in green and the actual right-skewed distribution in orange. Both distributions have the same mean: . The 95th percentile for each distribution is shown. Because the right tail of the right-skewed distribution is extended, the 95th percentile of the right-skewed distribution is greater than the 95th percentile of the normal distribution. Thus, the actual 95th percentile of the number of text messages sent per day by 18- to 24-year-old Americans is greater than 167.
    image
    Figure 6.69: FIGURE 67 Incorrect assumption of normality led us to underestimate the 95th percentile of the number of text messages.

379

3Assessing Normality Using Normal Probability Plots

Much of the analysis we conduct in this text requires that the sample data come from a population that is normally distributed. But how do we assess whether a data set is normally distributed? Histograms, dotplots, and stem-and-leaf displays may be used. But a more precise graphical tool for assessing normality is the normal probability plot. A normal probability plot is a scatterplot of the estimated cumulative normal probabilities (expressed as percents) against the corresponding data values in the data set.

Analyzing Normal Probability Plots

If the points in the normal probability plot either cluster around a straight line or nearly all fall within the curved bounds, then it is likely that the data set is normal. Systematic deviations off the straight line are evidence against the claim that the data set is normal.

Professional statistical analysts always use technology to construct normal probability plots. We show how this is done in the Step-by-Step Technology Guide at the end of this section.

EXAMPLE 43Normal probability plots

Figures 68 and 69 show normal probability plots for two different data sets. Analyze these plots for evidence for or against the normality of each data set.

Solution

In Figure 68, the points are arrayed nicely along the straight line, and all the points lie within the curved bounds. We therefore conclude that the data represented in Figure 68 are normally distributed. (In fact, the underlying data are drawn from a normal distribution.) In Figure 69, the points do not line up in a straight line, and many points lie outside the curved bounds, indicating that the data set is not normal. We therefore conclude that the data represented in Figure 69 are not normally distributed. (In reality, the underlying data set is right-skewed.)

image
Figure 6.70: FIGURE 68 Normal probability plot of normal data.
image
Figure 6.71: FIGURE 69 Normal probability plot of right-skewed data.

NOW YOU CAN DO

Exercises 27–30.