23
• Exploratory data analysis uses graphs and numerical summaries to describe the variables in a data set and the relations among them.
• The distribution of a variable tells us what values it takes and how often it takes these values.
• Bar graphs and pie charts display the distributions of categorical variables. These graphs use the counts or percents of the categories.
• Stemplots and histograms display the distributions of quantitative variables. Stemplots separate each observation into a stem and a one-digit leaf. Histograms plot the frequencies (counts) or the percents of equal-width classes of values.
• When examining a distribution, look for shape, center, and spread and for clear deviations from the overall shape.
• Some distributions have simple shapes, such as symmetric or skewed. The number of modes (major peaks) is another aspect of overall shape. Not all distributions have a simple overall shape, especially when there are few observations.
• Outliers are observations that lie outside the overall pattern of a distribution. Always look for outliers and try to explain them.
• When observations on a variable are taken over time, make a time plot that graphs time horizontally and the values of the variable vertically. A time plot can reveal changes over time.