2.2 Shapes of Distributions

We learned how to organize data so that we can better understand the concept of a distribution, a major building block for statistical analysis. We can’t get a sense of the overall pattern of data by looking at a list of numbers, but we can get a sense of the pattern by looking at a frequency table. We can get an even better sense by creating a graph. Histograms and frequency polygons allow us to see the overall pattern, or shape, of the distribution of data.

The shape of a distribution provides distinctive information. For example, when the U.S.-based General Social Survey (a large data set available to the public via the Internet) asked people about the influence of children’s programming—both network television and public television—their responses produced very different patterns for each type of children’s programming (Figure 2-4). For example, the most common response for network television shows was that they have a neutral influence, whereas the most common response for public television shows was that they have a positive influence. In this section, we provide you with language that expresses the differences between these patterns. Specifically, you’ll learn to describe different shapes of distributions, including normal distributions and skewed distributions.

35

Figure 2-4

The Influence of Television Programming on Children These two histograms tell different stories about the perceptions of the influence of television programming on children. The first histogram (a) describes the influence of network television on children; the second histogram (b) describes the influence of public television on children.

Normal Distributions

A normal distribution is a specific frequency distribution that is a bell-shaped, symmetric, unimodal curve.

Many, but not all, distributions of variables form a bell-shaped, or normal, curve. Statisticians use the word normal to describe distributions in a very particular way. A normal distribution is a specific frequency distribution that is a bell-shaped, symmetric, unimodal curve (Figure 2-5). People’s attitudes about the network programming of children’s shows provide an example of a distribution that approaches a normal distribution. There are fewer scores at values that are farther from the center and even fewer scores at the most extreme values (as can be seen in the bar graph in Figure 2-4a). Most scores cluster around the word neutral in the middle of the distribution, which would be at the top of the bell.

Figure 2-5

The Normal Distribution The normal distribution, shown here for IQ scores, is a frequency distribution that is bell-shaped, symmetric, and unimodal. It is central to many calculations in statistics.

Skewed Distributions

A skewed distribution is a distribution in which one of the tails of the distribution is pulled away from the center.

Reality is often—but not always—normally distributed, which means that the distributions describing those particular observations are not shaped normally. So we need a new term to help us describe such distributions—skew. Skewed distributions are distributions in which one of the tails of the distribution is pulled away from the center. Although the technical term for such data is skewed, a skewed distribution may also be described as lopsided, off-center, or simply nonsymmetric. Skewed data have an ever-thinning tail in one direction or the other. The distribution of people’s attitudes about the children’s programming offered by public television (see Figure 2-4b) is an example of a skewed distribution. The scores cluster to the right side of the distribution around the word positive, and the tail extends to the left.

MASTERING THE CONCEPT

2.3: If a histogram indicates that the data are symmetric and bell shaped, then the data are normally distributed. If the data are not symmetric and the tail extends to the right, the data are positively skewed; if the tail extends to the left, the data are negatively skewed.

36

Figure 2-6

Two Kinds of Skew The mnemonic “the tail tells the tale” means that the distribution with the long, thin tail to the right is positively skewed and the distribution with the long, thin tail to the left is negatively skewed.

With positively skewed data, the distribution’s tail extends to the right, in a positive direction.

A floor effect is a situation in which a constraint prevents a variable from taking values below a certain point.

When a distribution is positively skewed, as in Figure 2-6a, the tail of the distribution extends to the right, in a positive direction. Positive skew sometimes occurs when there is a floor effect, a situation in which a constraint prevents a variable from taking values below a certain point. For example, in the “World Cup success” data, scores indicating how many countries came in first or second in the World Cup a certain number of times is an example of a positively skewed distribution with a floor effect. Most countries never came in first or second, which means that the data were constrained at the lower end of the distribution, 0 (that is, they can’t go below 0).

Negatively skewed data have a distribution with a tail that extends to the left, in a negative direction.

A ceiling effect is a situation in which a constraint prevents a variable from taking on values above a given number.

The distribution in Figure 2-6b shows negatively skewed data, which have a distribution with a tail that extends to the left, in a negative direction. The distribution of people’s attitudes toward public television’s programming of children’s shows is favorable because it is clustered around the word positive, but we describe the shape of that distribution as negatively skewed because the thin tail is to the left side of the distribution. Not surprisingly, negative skew is sometimes the result of a ceiling effect, a situation in which a constraint prevents a variable from taking on values above a given number. If a professor gives an extremely easy quiz, then the quiz scores might show a ceiling effect. A number of students would cluster around 100, the highest possible score, with a few stragglers down in the lower end.

Next Steps

Stem-and-Leaf Plot

A stem-and-leaf plot is a graph that displays all the data points of a variable (or of two levels of a variable) both numerically and visually.

Histograms and frequency polygons do not let us view two groups in a single graph very easily, but the stem-and-leaf plot does. A stem-and-leaf plot is a graph that displays all the data points of a variable (or of two levels of a variable) both numerically and visually. Students in our classes reported numbers of minutes they typically spend in the shower. Here are the data for 30 women, arranged from lowest to highest:

37

STEP 1: Create the stem.

In this example, the stem will consist of the first digit for each of these numbers, arranged from highest to lowest:

Note three features of this particular stem:

  1. We group the digits by 10’s (0–9, 10–19, 20–29, 30–39, 40–49, 50–59, 60–69).
  2. The first digit for numbers below 10 is 0.
  3. Each category is represented on the stem, even if it has no leaf (e.g., no score in the category 50–59).

STEP 2: Add the leaves.

The leaves, the last digit for each score, are added in ascending order for each part of the stem, as shown in Table 2-7. In our example, the only scores between 0 and 9 are 5 and 8, so these two leaves will be added next to 0. There are 12 scores between 10 and 19. Some, like 10 and 15, are repeated. In these cases, a 5, to represent 15, is added as a leaf for every instance of 15. There are six 15’s, so there will be six 5’s next to the stem of 1. There are no scores between 50 and 59, so the part of the stem that begins with 5 will have no leaves.

Table : TABLE 2-7. A Stem-and-Leaf Plot For numbers with two digits, a stem-and-leaf plot includes the first digits as the stem and the second digits as the leaves. This graph allows us to see the shape of the data, along with the individual scores.
Minutes Typically Spent in the Shower—Women:
6 0
5
4 05
3 000005
2 0000035
1 000025555558
0 58

The stem-and-leaf plot displays the same information as a histogram, but in a slightly different way and with a little more detail. In fact, as seen in Figure 2-7, the stem-and-leaf plot looks like a histogram if turned on its side.

Figure 2-7

A Histogram and a Stem-and-Leaf Plot The stem-and-leaf plot displays the same information as a histogram, but in a slightly different way and with a little more detail.

We can also include a sample of men on the other side of the stem, and view two groups side by side. Here are the scores in minutes for 30 men:

We add those scores to the left of the stem for the women, as shown in Table 2-8. We can now see, for example, that women’s scores tend to be slightly higher and more varied than men’s scores. The distribution of women’s scores is somewhat skewed to the right, and the outlier (60 minutes in the shower!) is evident.

38

Table : TABLE 2-8. A Side-by-Side Stem-and-Leaf Plot Stem-and-leaf plots can be expanded to include scores for two samples on the same measure, a helpful technique for examining shapes of distributions in research designs that involve two groups.
Minutes Typically Spent in the Shower:
Men Women
6 0
5
4 05
3 000005
500000 2 0000035
5555555552000000000 1 000025555558
98875 0 58

CHECK YOUR LEARNING

Reviewing the Concepts

  • A normal distribution is a specific distribution that is unimodal, symmetric, and bell shaped.
  • A skewed distribution “leans” either to the left or to the right. A tail to the right indicates positive skew; a tail to the left indicates negative skew.
  • Stem-and-leaf plots allow us to view the shape of a sample’s distribution while displaying every single data point in the sample. Stem-and-leaf plots can depict the scores of two groups side by side to allow for easy comparisons of distributions.

Clarifying the Concepts

  • 2-5 Distinguish a normal distribution from a skewed distribution.
  • 2-6 When the bulk of data cluster together but the data trail off to the left, you have _________ skew; when that data trail off to the right, you have _________ skew.

Calculating the Statistics

  • 2-7 In Check Your Learning 2-3, you constructed two visual displays of the distribution of citations per faculty at top universities around the world. How would you describe this distribution? Is there skew evident in your graphs? If yes, what kind of skew is there?
  • 2-8 Alzheimer’s disease is typically diagnosed in adults above the age of 70; cases diagnosed sooner are called “early onset.”
    1. Assuming that these early-onset cases represent unique trailing off of data on one side, would this represent positive skew or negative skew?
    2. Do these data represent a floor effect or ceiling effect?

Applying the Concepts

  • 2-9 Referring to Check Your Learning 2-8, what implication would identifying such skew have in the screening and treatment process for Alzheimer’s disease?

Solutions to these Check Your Learning questions can be found in Appendix D.

39