Exploratory data analysis uses graphs and numerical summaries to describe the variables in a data set and the relations among them.
Page 21
The distribution of a variable describes what values the variable takes and how often it takes these values.
To describe a distribution, begin with a graph. Bar graphs and pie charts describe the distribution of a categorical variable, and Pareto charts identify the most important categories for a categorical variable. Histograms and stemplots graph the distributions of quantitative variables.
When examining any graph, look for an overall pattern and for notable deviations from the pattern.
Shape, center, and spread describe the overall pattern of a distribution. Some distributions have simple shapes, such as symmetric and skewed. Not all distributions have a simple overall shape, especially when there are few observations.
Outliers are observations that lie outside the overall pattern of a distribution. Always look for outliers and try to explain them.
When observations on a variable are taken over time, make a time plot that graphs time horizontally and the values of the variable vertically. A time plot can reveal interesting patterns in a set of data.