Review Vocabulary

Review Vocabulary

Boxplot A graph of the five-number summary. A box spans the quartiles, with an interior line marking the median. Lines extend out from this box to the extreme high and low observations. A modified version of the boxplot extends the lines only to the extreme high and low observations that are within 1.5 box widths of the quartiles. Any values that are more extreme are marked (often with asterisks) as outliers. (pp. p. 201 p. 202)

Class intervals Non-overlapping, consecutive intervals into which data are classified to give an idea of the distribution. Class intervals are generally of equal width. (p. 186)

Density curve A curve that summarizes the overall pattern of a distribution. A density curve lies on or above the horizontal axis and has an area under the curve equal to 1. (p. 211)

Distribution The pattern of how often a variable takes certain values or intervals of values. (p. 184)

Dotplot A display of the distribution of a variable in which each observation (or group of a specified number of observations) is represented by a dot above a horizontal axis. Dots representing the same value are stacked vertically above that value. (p. 197)

Exploratory data analysis (EDA) The practice of using graphs and numbers to examine data for overall patterns and special features, without necessarily seeking answers to specific questions. (p. 183)

Five-number summary A summary of a distribution that gives the smallest observation, first quartile, median, third quartile, and largest observation, in that order. (p. 201)

Frequency distribution Classification of all observed values of a variable into non-overlapping classes or intervals that records how many times data values occur in each class. (pp. p. 184 p. 185)

Histogram A graph of the distribution of outcomes (often divided into classes) for a single quantitative variable. The height of each bar is the number of observations in the class of outcomes covered by the base of the bar. All classes should have the same width, and each observation must fall into exactly one class. (p. 188)

Individuals The people, animals, or things described by a dataset. (p. 182)

Left-skewed distribution A distribution in which the longer tail of the histogram is on the left side. (p. 190)

Mean The ordinary arithmetic average of a set of observations. To find the mean, add all the observations and divide the sum by the number of observations. (p. 197)

Mean of a normal distribution The balance point of a normal density curve that represents a normal distribution. The mean is at the line of symmetry of the normal curve. (p. 213)

Median The middle of a set of ordered observations. Half the observations fall below the median, and half fall above. (pp. p. 197 p. 198)

Mode The most frequently occurring value in a set of numerical observations. (pp. p. 197 p. 199)

Normal density curve Symmetric, bell-shaped curve. The center line of the normal curve is at the mean. The change-of-curvature in the bell-shaped curve occurs 1 standard deviation on either side of the mean. All density curves are scaled so that the area under the curve is 1 (p. 210).

Normal distributions A family of distributions that describe how often a variable takes its values by areas under a curve, called a normal density curve. A specific normal curve is completely described by two numbers: its mean and its standard deviation. (pp. p. 208 p. 211)

Outlier A data point that falls clearly outside the overall pattern of a set of data. (p. 188)

Quartiles The first quartile () of a distribution is the point with one-quarter of the observations falling below it; the third quartile () is the point with three-quarters below it. Calculate and by determining the median of the lower half and upper half of the ordered observations, respectively. (p. 200)

Quartiles of a normal distribution The first and third quartiles of a normal distribution are around 0.67 standard deviation below and above the mean, respectively. (p. 214)

Range The measure of variability obtained by subtracting the smallest observation from the largest observation. (p. 200)

Relative frequency distribution Classification of all observed values of a variable into non-overlapping classes or intervals that records what fraction (or percentage) of data values occur in each class. (pp. p. 185 p. 186)

Right-skewed distribution A distribution in which the longer tail of the histogram is on the right side. (p. 190)

220

68-95-99.7 rule In any normal distribution, approximately 68% of the observations lie within 1 standard deviation on either side of the mean, 95% lie within 2 standard deviations of the mean, and 99.7% lie within 3 standard deviations of the mean. (p. 216)

Standard deviation A measure of the variability of a distribution about its mean as center. It is the square root of the average squared deviation of the observations from their mean. (pp. p. 203 p. 204)

Standard deviation of a normal distribution The distance from the mean to the change-of-curvature point on either side of the normal density curve, which represents the distribution. (p. 213)

Stemplot A display of the distribution of a variable that attaches the final digits of the observations as leaves on stems made up of all but the final digit. (p. 194)

Symmetric distribution A distribution with a histogram, stemplot, or dotplot in which the part to the left of the median is roughly a mirror image of the part to the right of the median. (p. 191)

Variable A particular characteristic that can take on different values for different individuals. (p. 182)