46
• A numerical summary of a distribution should report its center and its spread or variability.
• The mean and the median M describe the center of a distribution in different ways. The mean is the arithmetic average of the observations, and the median is their midpoint.
• When you use the median to describe the center of a distribution, describe its spread by giving the quartiles. The first quartile Q 1 has one-fourth of the observations below it, and the third quartile Q 3 has three-fourths of the observations below it.
• The interquartile range is the difference between the quartiles. It is the spread of the center half of the data. The 1.5 × IQR rule flags observations more than 1.5 × IQR beyond the quartiles as possible outliers.
• The five-number summary consisting of the median, the quartiles, and the smallest and largest individual observations provides a quick overall description of a distribution. The median describes the center, and the quartiles and extremes show the spread.
• Boxplots based on the five-number summary are useful for comparing several distributions. The box spans the quartiles and shows the spread of the central half of the distribution. The median is marked within the box. Lines extend from the box to the extremes and show the full spread of the data. In a modified boxplot, points identified by the 1.5 × IQR rule are plotted individually. Side-by-side boxplots can be used to display boxplots for more than one group on the same graph.
• The variance s 2 and especially its square root, the standard deviation s , are common measures of spread about the mean as center. The standard deviation s is zero when there is no spread and gets larger as the spread increases.
• A resistant measure of any aspect of a distribution is relatively unaffected by changes in the numerical value of a small proportion of the total number of observations, no matter how large these changes are. The median and quartiles are resistant, but the mean and the standard deviation are not.
• The mean and standard deviation are good descriptions for symmetric distributions without outliers. They are most useful for the Normal distributions introduced in the next section. The five-number summary is a better exploratory description for skewed distributions.
• Linear transformations have the form xnew = a + bx. A linear transformation changes the origin if a ≠ 0 and changes the size of the unit of measurement if b > 0. Linear transformations do not change the overall shape of a distribution. A linear transformation multiplies a measure of spread by b and changes a percentile or measure of center m into a + bm.
• Numerical measures of particular aspects of a distribution, such as center and spread, do not report the entire shape of most distributions. In some cases, particularly distributions with multiple peaks and gaps, these measures may not be very informative.