368
OBJECTIVES By the end of this section, I will be able to …
1Finding Probabilities for Any Normal Distribution
The data in problems that we face in the real world do not usually follow the standard normal distribution, . Instead, a problem may be stated in terms of some normal random variable that has a mean other than 0 or a standard deviation other than 1. In cases like these, needs to be standardized to so that we can use the Section 6.4 techniques.
To standardize things means to make them all the same. For example, college applicants take standardized tests so that the admissions officers can compare students according to a consistent assessment tool. Here, we standardize many different normal random variables into the same standard normal .
Standardizing to
To standardize a normal random variable , we transform that normal random variable into the standard normal random variable .
Suppose that is a normal random variable with population mean and population standard deviation . We standardize by subtracting the mean and dividing by the standard deviation . The result of this transformation is the familiar standard normal random variable .
Standardizing a Normal Random Variable
Any normal random variable can be transformed into the standard normal random variable by standardizing with the formula
The key here is the following: for a given area of interest for a normal random variable , the corresponding area after the transformation to is exactly the same. For any normal random variable
the area between and
is exactly the same as
the area between and (see Figure 46)
So we can solve problems about areas under the nonstandard normal curve by using the corresponding area under the curve.
369
EXAMPLE 37April in Georgia
The state of Georgia reports that the mean temperature statewide for the month of April is . Assume that the standard deviation is and that temperature in Georgia in April is normally distributed. Draw the normal curve for temperatures between 45.5°F and 77.5°F, and the corresponding curve. Find the probability that the temperature is between 45.5°F and 77.5°F in April in Georgia.
Solution
Here, we have and , giving us
In Figure 47, the area between and is the same as between and . In other words,
This is a Case 3 problem from Table 8 (page 355). The table tells us that the area to the left of is 0.0228, and the area to the left of is 0.9772. The area between −2 and 2 is then equal to . The probability that the temperature is between 45.5°F and 77.5°F in April in Georgia is 0.9544.
Finding Probabilities for Any Normal Distribution
Check Your Answer! According to the Empirical Rule, almost all -values lie between −3 and 3, so it is unlikely that a randomly selected value of lies outside this range. You should remember this when you are doing your calcu lations. If you are standardizing a normal random variable and get a very large -value (such as ), you should recheck your calculations because the probability that takes such a large value is very small.
370
EXAMPLE 38Finding probability for a normal random variable
SAT Scores and AP Exam Scores
The College Board reports that the population mean Math SAT score in 2013 was = 514, with a population standard deviation of , and that the scores follow a normal distribution. Suppose that a local college wants to identify at-risk math students, which it considers to be students scoring below 396 on the Math SAT Find the proportion of students who score below 396 on the Math SAT.
Remember that you may solve problems asking for proportions or percentages by finding the appropriate probability.
Solution
Step 1 Determine , , and .
We are given that the normal random variable = Math SAT score has mean and standard deviation . In the center of the number line, mark the mean . Also mark on the number line the value of about which the problem is asking. Figure 48 shows the graph of (the Math SAT scores) with the mean of 514 and the score of 396 marked.
You need to know the proportion of scores below 396, so shade the area under the curve to the left of 396. We can express this proportion as a probability, the probability that a randomly chosen student will score less than 396, or . Just by looking at Figure 48, you should be able to get a rough idea of what the proportion of these scores will be. Certainly, this proportion will be less than 50%. If you get an answer such as “60%” for your proportion, you should recognize that it is wrong.
Step 2 Standardize.
Now standardize the random variable to the standard normal :
Find the -value corresponding to the Math SAT score of 396:
So the -value associated with a score of 396 is −1, which indicates that the score of 396 is 1 standard deviation below the mean of 514.
Step 3 Draw the standard normal curve.
Scores less than 396 are more than 1 standard deviation below the mean, so shade the area to the left of −1 in Figure 49. Now find the area to the left of using the methods of Section 6.4.
Step 4 Find the area under the standard normal curve.
The table tells us that the area to the left of is 0.1587.
The proportion of scores below 396 is 0.1587, or 15.87%. Note that this value for agrees with our earlier intuition that the proportion was less than 50%.
NOW YOU CAN DO
Exercises 3–9.
YOUR TURN #19
For the scenario in Example 38, find the proportion of Math SAT scores greater than 600.
(The solution is shown in Appendix A.)
371
EXAMPLE 39Finding the probability that lies between two given values
SAT Scores and AP Exam Scores
Continuing the Math SAT score problem, what percentage of students score between 215 and 595?
The Normal Density Curve applet allows you to find areas associated with various values of any normal random variable.
Solution
Step 1 Determine , , and .
We have already seen that has mean and standard deviation . Once again, draw a graph of the distribution of scores , with the mean 514 in the middle, the score 215 to the left of the mean, and the score 595 to the right of the mean, as in Figure 50.
Step 2 Standardize.
This is a “between” example, where two values of are given, and we are asked to find the area between them. In this case, just standardize both of these values of to get a -value for each:
Step 3 Draw the standard normal curve.
Draw a graph of , shading the area between and , as shown in Figure 51. Again, the key is that the area between and is exactly the same as the area between and .
Step 4 Find area under the standard normal curve.
Figure 51 is a Case 3 problem from Table 8 (page 355). Find the area to the left of 0.69, which is 0.7549, and the area to the left of −2.53, which is 0.0057. Subtracting the smaller from the larger gives us
Thus, the percentage of Math SAT scores that are between 215 and 595 is 74.92%.
NOW YOU CAN DO
Exercises 10–14.
372
YOUR TURN #20
For the scenario in Example 39, find the proportion of Math SAT scores between 305 and 605.
(The solution is shown in Appendix A.)
2Finding a Normal Data Value for a Given Area or Probability
Sometimes we are given a probability (or proportion or area), and we are asked to find the associated value of . Questions like these are similar to the “backwards” problems of Section 6.4, which are so called because we must use the table backward or inside out. The formula for standardizing gives the value for , so we need to use our algebra skills to find the equation for : Start with the standard normal formula . Multiply both sides by to get . Then add to both sides, giving us .
Finding Normal Data Values for a Given Area or Probability
EXAMPLE 40Finding a normal data value for a given area
SAT Scores and AP Exam Scores
Suppose the students in the top 1% of Math SAT scores won a fellowship to an Ivy League university. What is the score that students will have to obtain to win this fellowship?
Solution
Notice that we are not asked to find a probability (or proportion or area). Instead, we are given a percentage (1%) and asked to find the value of (the Math SAT score) that is associated with this 1%.
Step 1 Determine , , and , and draw the normal curve for .
We already know that , , and . The value of in which we are interested refers to high scores, so that will be at the far right of the distribution of . Only 1% of scores will be greater than this score, so the area to the right of is 0.01, as shown in Figure 52.
373
Step 2 Find the -value corresponding to the desired area.
The area to the right of equals 0.01, so that the area to the left of equals . Looking up 0.99 on the inside of the table gives us .
Step 3 Transform using the formula .
We calculate
The cutoff value for the top 1% of Math SAT scores for winning a fellowship to an Ivy League university is 788.94. It won't be easy getting that fellowship.
NOW YOU CAN DO
Exercises 15–22.
YOUR TURN #21
For the situation in Example 40, what is the Math SAT score that separates the lowest 2.5% of the scores from the others?
(The solution is shown in Appendix A.)
EXAMPLE 41Finding the -values that mark the boundaries of the middle 95% of -values
Edmunds.com reported that the average amount that people were paying for a 2015 Toyota Camry XLE was $28,720. Let , and assume that price follows a normal distribution with and . Find the prices that separate the middle 95% of 2015 Toyota Camry XLE prices from the bottom 2.5% and the top 2.5%.
Solution
Step 1 Determine , , and , and draw the normal curve for .
Let , , and . The middle 95% of prices are between and , as shown in Figure 53.
Step 2 Find the -values corresponding to the desired area.
The area to the left of equals 0.025, and the area to the left of equals 0.975. Looking up area 0.025 on the inside of the table gives us . Looking up area 0.975 on the inside of the table gives us .
Step 3 Transform using the formula .
We calculate
The prices that separate the middle 95% of 2015 Toyota Camry XLE prices from the bottom 2.5% of prices and the top 2.5% of prices are $26,760 and $30,680.
NOW YOU CAN DO
Exercises 23–26.
374
YOUR TURN #22
For the situation in Example 41, find the two prices that separate the middle 90% of prices from the bottom 5% and the top 5%.
(The solution is shown in Appendix A.)
What If Scenario: How Change in Spread Affects Camry Prices
In Example 41, what if we ask the same question again, but this time the standard deviation of 2015 Toyota Camry XLE prices is not $1000, but some value less than $1000? How and why would this affect the following?
Solution
Figure 54 illustrates the distribution of 2015 Toyota Camry XLE prices, where everything is the same as in Figure 53, except that the standard deviation of the prices is smaller by an unknown amount. Thus, the spread of the distribution is smaller.
Re-express the formula as . If is smaller than $1000, then the quantity 1.96 · , which represents the difference between the mean price and , will also be smaller.
Because is less than the mean , the smaller difference between the mean price and leads us to conclude that will be larger than in Example 41. For example, if the new standard deviation is , then , which is larger than the $26,760 in Example 41.
375
EXAMPLE 42Normal probabilities and percentiles using technology
Applying the information on Toyota Camry prices from Example 41, use the TI-83/84, Excel, Minitab, or JMP to find the following:
Solution
The instructions for finding these quantities are given in the Step-by-Step Technology Guide at the end of this section (page 380).
TI-83/84
Excel
376
Minitab
377
JMP
378
Developing Your Statistical Sense
Text Messaging: Be Careful What You Assume
The Pew Internet and American Life Project reported in 2011 that the mean number of text messages sent per day by 18- to 24-year-old Americans is 109.5. Assume that the distribution of the number of text messages is normal, with and standard deviation .
Problem 1. Suppose that cell phone customers get a special rate if the number of text messages they send per day is at or above the 95th percentile. Find the number of text messages represented by the 95th percentile.
Solution to Problem 1. On the assumption that the number of text messages is normally distributed, and working similarly to Example 42b, we find the 95th percentile of text messages to be about 167, as shown in Figure 66.
Problem 2. Pew reports further that the median number of text messages sent per day by 18- to 24-year-old Americans is 50.
Solution to Problem 2.
379
3Assessing Normality Using Normal Probability Plots
Much of the analysis we conduct in this text requires that the sample data come from a population that is normally distributed. But how do we assess whether a data set is normally distributed? Histograms, dotplots, and stem-and-leaf displays may be used. But a more precise graphical tool for assessing normality is the normal probability plot. A normal probability plot is a scatterplot of the estimated cumulative normal probabilities (expressed as percents) against the corresponding data values in the data set.
Analyzing Normal Probability Plots
If the points in the normal probability plot either cluster around a straight line or nearly all fall within the curved bounds, then it is likely that the data set is normal. Systematic deviations off the straight line are evidence against the claim that the data set is normal.
Professional statistical analysts always use technology to construct normal probability plots. We show how this is done in the Step-by-Step Technology Guide at the end of this section.
EXAMPLE 43Normal probability plots
Figures 68 and 69 show normal probability plots for two different data sets. Analyze these plots for evidence for or against the normality of each data set.
Solution
In Figure 68, the points are arrayed nicely along the straight line, and all the points lie within the curved bounds. We therefore conclude that the data represented in Figure 68 are normally distributed. (In fact, the underlying data are drawn from a normal distribution.) In Figure 69, the points do not line up in a straight line, and many points lie outside the curved bounds, indicating that the data set is not normal. We therefore conclude that the data represented in Figure 69 are not normally distributed. (In reality, the underlying data set is right-skewed.)
NOW YOU CAN DO
Exercises 27–30.