Section 1.2 Summary
- Exploratory data analysis uses graphs and numerical summaries to describe the variables in a data set and the relations among them.
- The distribution of a variable describes what values the variable takes and how often it takes these values.
- To describe a distribution, begin with a graph. Bar graphs and pie charts describe the distribution of a categorical variable, and Pareto charts identify the most important categories for a categorical variable. Histograms and stemplots graph the distributions of quantitative variables.
- When examining any graph, look for an overall pattern and for notable deviations from the pattern.
- Shape, center, and spread describe the overall pattern of a distribution. Some distributions have simple shapes, such as symmetric and skewed. Not all distributions have a simple overall shape, especially when there are few observations.
- Outliers are observations that lie outside the overall pattern of a distribution. Always look for outliers and try to explain them.
- When observations on a variable are taken over time, make a time plot that graphs time horizontally and the values of the variable vertically. A time plot can reveal interesting patterns in a set of data.