2.2 Shapes of Distributions
We learned how to organize data so that we can better understand the concept of a distribution, a major building block for statistical analysis. We can’t get a sense of the overall pattern of data by looking at a list of numbers, but we can get a sense of the pattern by looking at a frequency table. We can get an even better sense by creating a graph. Histograms and frequency polygons allow us to see the overall pattern, or shape, of the distribution of data.
The shape of a distribution provides distinctive information. For example, when the U.S.-based General Social Survey (a large data set available to the public via the Internet) asked people about the influence of children’s programming—both network television and public television—the responses showed very different patterns for each type of children’s programming (Figure 2-4a, Figure 2-4b). For example, the most common response for network television shows was that they have a neutral influence, whereas the most common response for public television shows was that they have a positive influence. In this section, we provide you with language that expresses the differences between these patterns. Specifically, you’ll learn to describe different shapes of distributions, including normal distributions and skewed distributions.
FIGURE 2-4
The Influence of Television Programming on Children
These two histograms tell different stories about the perceptions of the influence of television programming on children. The first histogram (a) describes the influence of network television on children; the second histogram (b) describes the influence of public television on children.
Normal Distributions
A normal distribution is a specific frequency distribution that is a bell-shaped, symmetric, unimodal curve.
Many, but not all, distributions of variables form a bell-shaped, or normal, curve. Statisticians use the word normal to describe distributions in a very particular way. A normal distribution is a specific frequency distribution that is a bell-shaped, symmetric, unimodal curve (Figure 2-5). People’s attitudes about the network programming of children’s shows provide an example of a distribution that approaches a normal distribution. There are fewer scores at values that are farther from the center and even fewer scores at the most extreme values (as can be seen in the bar graph in Figure 2-4a). Most scores cluster around the word neutral in the middle of the distribution, which would be at the top of the bell.
Skewed Distributions
A skewed distribution is a distribution in which one of the tails of the distribution is pulled away from the center.
Reality is often—but not always—normally distributed, which means that the distributions describing some observations are not shaped normally. So we need a new term to help us describe some of the distributions that are not normal—skew. Skewed distributions are distributions in which one of the tails of the distribution is pulled away from the center. Although the technical term for such data is skewed, a skewed distribution may also be described as lopsided, off-center, or simply nonsymmetric. Skewed data have an ever-thinning tail in one direction or the other. The distribution of people’s attitudes about the children’s programming offered by public television (see Figure 2-4b) is an example of a skewed distribution. The scores cluster to the right side of the distribution around the word positive, and the tail extends to the left.
Figure 2.11: FIGURE 2-5
The Normal Distribution
The normal distribution, shown here for IQ scores, is a frequency distribution that is bell-shaped, symmetric, and unimodal. It is central to many calculations in statistics.
With positively skewed data, the distribution’s tail extends to the right, in a positive direction.
A floor effect is a situation in which a constraint prevents a variable from taking values below a certain point.
When a distribution is positively skewed, as in Figure 2-6a, the tail of the distribution extends to the right, in a positive direction. Positive skew sometimes occurs when there is a floor effect, a situation in which a constraint prevents a variable from taking values below a certain point. For example, in the “World Cup success” data, scores indicating how many countries came in first or second in the World Cup a certain number of times is an example of a positively skewed distribution with a floor effect. Most countries never came in first or second, which means that the data were constrained at the lower end of the distribution, 0 (that is, they can’t go below 0).
Figure 2.12: FIGURE 2-6
Two Kinds of Skew
The mnemonic “the tail tells the tale” means that the distribution with the long, thin tail to the right is positively skewed and the distribution with the long, thin tail to the left is negatively skewed.
MASTERING THE CONCEPT
2-3: If a histogram indicates that the data are symmetric and bell shaped, then the data are normally distributed. If the data are not symmetric and the tail extends to the right, the data are positively skewed; if the tail extends to the left, the data are negatively skewed.
Negatively skewed data have a distribution with a tail that extends to the left, in a negative direction.
A ceiling effect is a situation in which a constraint prevents a variable from taking on values above a given number.
The distribution in Figure 2-6b shows negatively skewed data, which have a distribution with a tail that extends to the left, in a negative direction. The distribution of people’s attitudes toward public television’s programming of children’s shows is favorable because it is clustered around the word positive, but we describe the shape of that distribution as negatively skewed because the thin tail is to the left side of the distribution. Not surprisingly, negative skew is sometimes the result of a ceiling effect, a situation in which a constraint prevents a variable from taking on values above a given number. If a professor gives an extremely easy quiz, then the quiz scores might show a ceiling effect. A number of students would cluster around 100, the highest possible score, with a few stragglers down in the lower end.
CHECK YOUR LEARNING
Reviewing the Concepts |
|
A normal distribution is a specific distribution that is unimodal, symmetric, and bell shaped.
A skewed distribution “leans” either to the left or to the right. A tail to the right indicates positive skew; a tail to the left indicates negative skew.
|
Clarifying the Concepts |
2-5 |
Distinguish a normal distribution from a skewed distribution. |
|
2-6 |
When the bulk of data cluster together but the data trail off to the left, the skew is _________; when that data trail off to the right, the skew is _________. |
Calculating the Statistics |
2-7 |
In Check Your Learning 2-3, you constructed two visual displays of the distribution of citations per faculty at top universities around the world. How would you describe this distribution? Is there skew evident in your graphs? If there is, what kind of skew is it? |
|
2-8 |
Alzheimer’s disease is typically diagnosed in adults above the age of 70; cases diagnosed sooner are called “early onset.”
Assuming that these early-onset cases represent unique trailing off of data on one side, would the skew be positive or negative?
Do these data represent a floor effect or ceiling effect?
|
Applying the Concepts |
2-9 |
Referring to Check Your Learning 2-8, what implication would identifying such skew have in the screening and treatment process for Alzheimer’s disease? |
Solutions to these Check Your Learning questions can be found in Appendix D.