A numerical summary of a distribution should report its center and its spread or variability.
The meanˉx and the medianM describe the center of a distribution in different ways. The mean is the arithmetic average of the observations, and the median is the midpoint of the values.
When you use the median to indicate the center of the distribution, describe its spread by giving the quartiles. The first quartile Q1 has one-fourth of the observations below it, and the third quartile Q3 has three-fourths of the observations below it.
The five-number summary—consisting of the median, the quartiles, and the high and low extremes—provides a quick overall description of a distribution. The median describes the center, and the quartiles and extremes show the spread.
Boxplots based on the five-number summary are useful for comparing several distributions. The box spans the quartiles and shows the spread of the central half of the distribution. The median is marked within the box. Lines extend from the box to the extremes and show the full spread of the data.
The variances2 and, especially, its square root, the standard deviations, are common measures of spread about the mean as center. The standard deviation s is zero when there is no spread and gets larger as the spread increases.
Page 35
A resistant measure of any aspect of a distribution is relatively unaffected by changes in the numerical value of a small proportion of the total number of observations, no matter how large these changes are. The median and quartiles are resistant, but the mean and the standard deviation are not.
The mean and standard deviation are good descriptions for symmetric distributions without outliers. They are most useful for the Normal distributions, introduced in the next section. The five-number summary is a better exploratory summary for skewed distributions.