Note: Page numbers in boldface indicates a definition; italics indicates a figure; t indicates a table.
A
Acceptance sampling, 300
Addition Rule, 262–265
for mutually exclusive events, 264–265
Adjusted coefficient of determination , 745
(alpha), 495
Alternative hypothesis (), 489–490
Amount of gold in coins, 395, 407–409
Analysis of variance (ANOVA), 664–713, 666
method of, 666–672
multiple comparisons using Bonferroni method and, 686–688
one-way, 673–679, 694–695
randomized block design and, 693–701
requirements for performing, 668–670
table for, 672, 672t
test statistic for, 671
Tukey's test for multiple comparisons and, 688–691
two-way, 701–710
AP exam scores, 309, 334–335, 371–373
Approximations. See also Point estimates; Pooled estimate; entries beginning with term Estimated
binomial probability distribution and. See Binomial probability distribution
pitfalls of using, 421
of probabilities for dependent events, 282
April in Georgia, 369
Area under the curve. See also Standard normal probability distribution
finding normal data values for, 372–378
Arithmetic mean. See Mean ()
Assumptions, of regression model, 718–721
Average. See Mean ()
B
Backward stepwise regression, 751–753
Balance point of data, mean as, 110, 110–111, 317–318
Bank loans, 575, 593–594, 595, 596–598
Bar graphs (bar charts), 43, 43–44, 44
clustered, 48–50, 49
inaccuracy in relative lengths of bars in, 100, 100
Bayes, Thomas, 282
Bayes’ Rule, 282–285, 283
Bell-shaped curve. See Normal probability distribution
(beta), 495, 565–567
Binomial discrete random variables, 327
mean of, 334–335
mode of, 335–336
standard deviation of, 334–335
variance of, 334–335
Binomial experiment, 326–328, 327
Binomial probability distribution, 326–341, 327, 385, 386
approximating probabilities of, using normal distribution, 385–389
binomial experiment and, 326–328
computing binomial probabilities and, 328–334
normal approximation to, 385–390, 386
Poisson approximation to, 345–346
tables for, T-3–T-8
Binomial probability distribution formula, 330–332
Blocking factors (blocks), 694
Bonferroni adjustment, 686–688
Boxplots (box-and-whisker plots), 173, 173–176
comparison, 178
constructing by hand, 174–175
for left-skewed data, 176, 176
for right-skewed data, 175, 175–176
symmetric, 176, 176
C
Calculators. See Technology guides
California wildfires, 3
Categorical data analysis, 630–663
goodness of fit test for, 632–645, 634
test for homogeneity of proportions and, 651–655
test for independence and, 646–651, 647
Causation, correlation vs., 198
Central Limit Theorem for Means, 399–402, 401
finding probabilities using, 405–406
finding two symmetric sample means using, 406–407
when to use, 401
Central Limit Theorem for Proportions, 415–423, 418, 464
determining whether the theorem applies and, 418
finding percentiles using, 420–421
finding probabilities using, 419–420
minimum sample size for approximate normality and, 417, 418–419
sampling proportion of the sample proportion and, 415–417
Chebyshev, P. L., 137, 138
Chebyshev's Rule, 137–140
finding minimum percentages using, 137–140
strengths and weaknesses of, 139
(chi-square) distribution, 474, 474
confidence interval for population standard deviation, 475, 482
confidence interval for population variance, 476, 482
confidence intervals for the population variance and standard deviation and, 476–478
finding critical values for, 475–476
properties of, 474–476
table for, T-12
(chi-square) goodness of fit test, 632–645, 634
critical-value method for, 637–639
-value method for, 639–641
(chi-square) test for homogeneity of proportions, 651–655
test for independence compared with, 652
(chi-square) test for independence, 646–651, 647
test for homogeneity of proportions compared with, 652
critical-value method for, 648–649, 649–650
-value method for, 650–651
(chi-square) test for population standard deviation, 556–564
confidence intervals for population standard deviation to conduct two-tailed tests and, 561
critical-value method for, 556–559
essential idea about, 556
-value method for, 559–560
Class(es), 61
frequency and relative frequency distributions using, 61–62, 61t
Class boundary, 62–63
Class limits, upper and lower, 62–63
Class midpoint, 66
Class width, 62–63
Classical method of assigning probabilities, 243–248, 244
Clothing store sales, 487, 502–503
Cluster sampling, stratified sampling vs., 25
Clustered bar graphs, 48–50, 49
Coefficient of determination (), 230–232
adjusted (), 745
calculating correlation coefficient using, 232
multiple (), 745
Combinations, 296–298
acceptance sampling using, 300
computing probabilities using, 301–302
number of, 298, 328–329
Common variance, pooled estimate for, 596
Comparison boxplots, 178
Comparison dotplots, 70–71, 71
Complement of an event A (), 259–260
probabilities for, 260
Computational formulas, 134. See also Formulas
Conditional probability, 270–291, 271
“at least” problems and, 281
Bayes’ Rule and, 282–285
for dependent events, 282
for independent events, 274–276
interpretation of, 274
Multiplication Rule and, 276–281
for mutually exclusive events, 280
1% Guideline and, 282
and and, 272
Confidence intervals, 426–485, 430. See also t interval for the population mean; entries beginning with term Z interval
for difference in means, 594–595
for difference in proportions, 612–613
equivalence of two-tailed tests and, 517–518
hypothesis testing and, 595–596
interpreting, 433
for mean of a normal population, 432–433
for mean value of given , 737–738
for population standard deviation, 476–478, 477
for population variance, 476–478
for population variance and standard deviation, 473–482
for proportions to conduct two-tailed tests for , 549–551
for regression, 726–730, 737–738, 740
for slope of regression line, 726–730
for standard deviation to conduct two-tailed tests for , 561
interval for the population mean and, 448–462
I-2
Welch's, for difference in population means, 594–595
interval for population mean and, 428–448
interval for the population proportion and, 463–473
for to conduct two-tailed tests for , 516–519
Confidence levels, 430
decreasing, reducing the margin of error by, 438
Constant variance assumption, of regression model, 718
Contingency tables. See Crosstabulations
Continuity correction, 387
Continuous data, frequency distributions for, 63–65
Continuous probability distributions, 349–363
normal probability distribution and. See Normal probability distribution
probabilities for uniform probability distribution and, 349–351
requirements for, 349
Continuous random variables, 311–312, 348–368
Continuous variables, 10
Controls, 28
Convenience sampling, 25
Correlation
causation vs., 198
negative, 197, 197
positive, 196, 196
Correlation coefficient (), 192–198, 193
calculating using coefficient of determination, 232
interpreting, 198
properties of, 195–197, 196–198
slope of regression line related to, 211
Count, 41
Counting, 291–303
computing probabilities using combinations and, 301–302
Multiplication Rule for, 291–294, 293
permutations and combinations and, 295–300
Critical regions, 500–501, 500t
Critical values
for test (), 525
for Tukey's test for multiple comparisons, table of, T-17–T-18
for test (), 500–501, 500t
Critical-value method
for goodness of fit test, 637–639
for test for independence, 648–649, 649–650
for test, 619–623
for hypothesis test for difference in population proportions, 608–610
for hypothesis testing for slope of regression line, 723
paired sample test for population mean of differences using, 578–580
for Welch's hypothesis test for difference in two population means, 591–592
test for the population mean using, 502–504
test for the population proportion using, 543–546
Crosstabulations, 45–47
test for independence and, 646
Cumulative frequency distributions, 87–88, 88t
Cumulative relative frequency distributions, 87–88, 88t
D
Data
actual vs. predicted, 214
continuous, frequency distributions for, 63–65
discrete, frequency distributions for, 60–61, 61t
fitting model to, 635
gathering, 20–36
grouped, 148–154
mean as balance point of, 110, 110–111, 317–318
misrepresentations by graphs, 95–103
presenting same data set as both symmetric and left-skewed on graphs, 100–101, 101
qualitative, frequency distributions for, 40–42, 41, 41t
quantitative, frequency distributions for, 60–61, 61t
tabular, 47–48
Data analysis, exploratory, 116–117
Data sets
comparing data from different data sets using -scores, 157–158
measures of center of. See Mean; Measures of center; Median; Mode
measures of relative position and, 155–171. See also Percentile(s); -scores
measures of variability of. See Measures of variability; Range; Standard deviation; Variance
multimodal, 115
quartiles of, 162–165
unimodal, 115
Data stories, 2–4
Degrees of freedom
in ANOVA, 671
of distribution, 618, 619
of distribution, 449
for test for independent means, 593
Denominator degrees of freedom, 618
Density curves, 349
Dependent events, 274
approximating probabilities for, 282
Dependent samples, 576–577
intervals for mean difference for, 582–584
test for mean of the differences and, 577–582
Dependent variables. See Response variables
Descriptive statistics, 7
statistical inference vs., 15
Design your own T-shirt, 292
Deviation, 128–130. See also Standard deviation
mean squared, 130
Dice, fair roll of, 247, 248–249
Discrete data, frequency distributions for, 60–61, 61t
Discrete probability distributions, mean as balance point of, 317–318
Discrete random variables, 311–312
expected value of, 318–319
identifying most likely value, 318
mean of, 315–318
most likely value of, 318
probability distribution of, 312–315
standard deviation of, 319–320
variance of, 319–320
-score method for determining unusual values and, 320
Discrete variables, 10
Disjoint events. See Mutually exclusive events
Dispersion, measures of. See Measures of variability; Range; Standard deviation; Variance
Distributions, 72. See also (chi-square) distribution; Frequency distributions; Probability distributions; Sampling distribution; distribution
left-skewed, 73, 73, 176, 176
right-skewed, 73, 73, 175, 175–176
shape of, 72, 72–75, 73
symmetric, 73
Dotplots, 70, 70–71
comparison, 70–71, 71
Dummy variables, 749–750
E
Edward VI, King of England, 408
Elements, 7–9
ELISA test for HIV, 284–285
Empirical method of assigning probabilities, 248–252
Empirical Rule, 135, 135–137, 136
finding percentages using, 136–137
strengths and weaknesses of, 139
Equally likely outcomes, 244
Equation, regression. See Regression equation
Error
experimentwise rate of (), 686
margin of. See Margin of error (E); Margin of error for interval
mean square (MSE), 671
prediction, 214–215
standard. See Standard error
sum of squares. See Sum of squares error (SSE)
Type I, 493–495, 494, 494t
Type II. See Type II error
Estimate(s). See Approximations; Point estimates; Pooled estimate
Estimated mean, 150
for data grouped into a frequency distribution, 149–150
Estimated standard deviation for data grouped into a frequency distribution, 151–152
Estimated variance for data grouped into a frequency distribution, 151–152
Euclid, 531
Events, 241
complement of, 259–260
dependent. See Dependent events
independent. See Independent events
intersection of, 261–262
mutually exclusive (disjoint). See Mutually exclusive events
union of, 261–262
Expected frequencies, 633
for goodness of fit test, 635
for test for independence, 647–648
Expected value (expectation) of a random variable , 318–319
Experiment(s), 240, 241
Experimental studies, 28–29, 30
Experimentwise error rate (), 686
Explanatory variables. See Predictor variables
Exploratory data analysis, 116–117
Extrapolation, 215–216
Extreme values
lack of sensitivity of median to, 112–113
sensitivity of mean to, 111–112
I-3
F
curve, properties of, 618
distribution, 617–619
finding critical values of, 619–621 table for, T-13–T-16
test, 617–625
critical-value method for, 619–623
for overall significance of multiple regression, 746–747
-value method for, 623–625
Factorial symbol (n!), 294, 329
False-negative rate, 284–285
False-positive rate, 284–285
Fisher, Ronald A., 617
Fitted values, 719
Five-number summary, 172–173
Florida lotto, 301
Formulas
adjusted coefficient of determination, 745
binomial probability distribution, 330
, 556, 570, 636
coefficient of determination, 230
confidence interval for population mean difference (dependent samples), 582, 627
correlation coefficient, 236
estimated mean, 150, 183
estimated standard deviation, 151, 183
estimated variance, 151, 183
interquartile range, 166
margin of error for interval for the mean, 454, 482, 594, 628
margin of error for interval for the mean, 433, 482
mean of a binomial random variable, 390
mean of a discrete random variable, 316, 390
minimum sample size, 417
multiple coefficient of determination, 745
number of combinations, 297, 305, 329, 390
confidence interval for , 612, 628
confidence interval for , 594, 627
permutation, 299, 305
Poisson probability distribution, 342, 390
pooled estimate of , 608, 628
pooled estimate of the common variance, 596, 628
population standard deviation, 132, 182
population variance, 130, 182
sample proportion, 415
sample size for estimating the population mean, 440, 482
sample size for estimating the population proportion, 468, 482
sample standard deviation, 182
sample variance, 183
slope of the regression line, 236
standard deviation of a binomial random variable, 390
standard deviation of a discrete random variable, 390
standard deviation of sampling distribution of sample proportion, 416
standard error of the estimate, 228
standardizing a normal random variable, 368, 390
standardizing a normal sampling distribution for means, 399, 423
standardizing a normal sampling distribution for proportions, 419, 423
sum of squares error, 226
sum of squares regression, 236
, 525, 570, 628
total sum of squares, 236
variance of a binomial random variable, 390
variance of a discrete random variable, 390
weighted mean, 149, 183
confidence interval for , 599, 628
interval for , 464, 482
interval for population mean, 431, 482
, 499, 543, 570, 599, 608, 610, 628
-score, 155, 183
Frequency, 41
expected. See Expected frequencies relative, 42
Frequency distributions
choosing, 62
for continuous data, 63–65
cumulative, 87–88, 88t
for discrete data, 60–61, 61t
estimated mean for data grouped into, 149–150
estimated standard deviation for data grouped into, 151–152
estimated variance for data grouped into, 151–152
for qualitative data, 40–42, 41, 41t
for quantitative data, 60–61, 61t
relative. See Relative frequency; Relative frequency distributions
using classes, 61–62, 61t
Frequency polygons, 67, 67–68
G
Gambler's Fallacy, 275–276
Gardasil vaccine, 239, 250, 282, 283
Gold, amount in coins, 395, 407–409
Golden ratio, 531–533
Goodness of fit test, , 632–645, 634
Gosset, William Sealy, 449
Graphs
bar, 43, 43–44, 44, 100, 100
boxplots. See Boxplots
choosing, 74–75
discrete probability distributions as, 314, 314–315
dotplots, 70, 70–71, 71
interaction, in two-way ANOVA, 702–703
manipulating scale of, 98, 98
misrepresentations of data by, 95–103
normal probability plots, 379
obtaining information from, 71–72
omitting zero on vertical scale, 97–98
pie charts, 44–45, 45
regression line. See Regression line
scatterplots. See Scatterplots
time series, 89–91, 90, 91
unclear labeling of, 99, 100, 100
using two dimensions for one-dimensional differences, 99
Grouped data, 148–154
H
Helmert, Friedrich, 474
Hinges of a boxplot, 173
Histograms, 65–67, 66
shifting to the left, 66–67, 67
Homogeneity, of proportions, test for, 651–655
Hypotheses, 486–573, 488
alternative (research), 489–490
for ANOVA, 673
null, 489–490
strategy for constructing, 491–492
testing. See Hypothesis testing for slope of regression line
testing of. See Hypothesis testing
validity of, 490–491
Hypothesis testing, 489
confidence intervals for, 595–596
confidence intervals for proportion for two-tailed tests and, 549–551
critical regions and critical values and, 500–501, 500t
for difference in population proportions, critical-value method for, 608–610
for difference in population proportions, -value method for, 610–612
left-tailed tests for, 501, 503
for mean, essential idea about, 497–498
power of a hypothesis test and, 567–568
for proportion, essential idea about, 543
right-tailed tests for, 501, 502–503
for slope of regression line. See Hypothesis testing for slope of regression line
for standard deviation, essential idea about, 556
statistical significance and, 492–493
test for the mean and. See t test for the population mean
two-tailed tests for, 501, 504, 516–519
Type I and Type II error and. See Type I error; Type II error
test for the mean using critical-value method and, 502–504
test for the mean using -value method and. See test for the mean using -value method
and, 499–500
Hypothesis testing for slope of regression line, 721–725
confidence intervals for, 726–730
critical-value method for, 723
-value method for, 723–725
test for, 727–730
I
Independence, test for, 646–651, 647
Independence assumption, of regression model, 718
Independent events, 274–276
alternative method for determining, 279
Multiplication Rule for Independent Events, 280–281
Multiplication Rule for Two Independent Events, 277
mutually exclusive events vs., 275
Independent samples, 576–577, 607
test for difference in means and, 589–594
interval for proportion difference and, 612–613
tests for proportion difference and, 606–612
Inference. See Statistical inference; Two-sample inference
Interaction, 702
two-way ANOVA and, 702–706
Interaction plots, 702–703
Intercept, of regression line, 210, 211
Interquartile range (IQR), 165–166
outlier detection using, 177–179
I-4
Intersection of events, 261–262
Intervals, 349. See also Confidence intervals; Prediction intervals; interval(s); interval for the population mean; interval; interval for the population mean; interval for the population proportion
sign, 451
Wilcoxon, 451
Intramural tennis league, 297
IQR. See Interquartile range (IQR)
IQR method of detecting outliers, 177–179
L
Law of Large Numbers, 249
Law of Total Probability, 241
for Continuous Random Variables, 349
Least-squares criterion, 226
Least-squares regression line. See Regression line
Leaves, 68–69
Left-skewed distributions, 73, 73 boxplot for, 176, 176
Left-tailed tests, 501, 503
test for the population mean using criticalvalue method as, 526–528
test for the population proportion using critical-value method as, 503
Level of significance, 495
Levels of measurement, 11
Linear, definition of, 193
Linear regression. See Regression analysis
Lower bound, for interval, 439–440
Lower class limit, 62–63
Lower hinge of a boxplot, 173
M
Margin of error ()
for intervals, 454, 594
for interval. See Margin of error for interval
Margin of error for interval, 429, 433
finding, given lower and upper bounds, 439–440
for , 466–467
for , 612
reducing, 437–440
Matched-pair samples, 576
Mean (), 108–112
as balance point of data, 110, 110–111, 317–318
of binomial random variables, 334–335
Central Limit Theorem for, 399–402, 401
of differences. See Population mean difference ()
of a discrete random variable , 315–320
essential idea about hypothesis testing for, 497–498
estimated, 150
for grouped data, estimating, 149–150
lack of representativeness of, 165
of normal population, confidence intervals for, 432–433
of normal probability distribution, 352–353
notation for, 109
of Poisson probability distribution, 344–345
population. See Population mean ()
sample. See Sample mean ()
of the sample proportion, 416–417
of sampling distribution of the sample mean, 397, 398–399
of sampling distribution of the sample proportion, 416–417
sensitivity to extreme values, 111–112
standard error of, 398
strategy for constructing hypotheses about, 491–492
weighted, 148–149
Mean square, 671
Mean square error (MSE), 671
Mean square treatment (MSTR), 671
Mean squared deviation, 130
Measurement, levels of, 11
Measures of center, 108. See also Mean; Median; Mode
skewness and, 115, 115–117
Measures of relative position, 155–171. See also Percentile(s); -scores
Measures of variability (measures of dispersion; measures of spread), 126–148, 127. See also Range; Standard deviation; Variance
Chebyshev's Rule and, 137–140
disagreement among, 140, 179
Empirical Rule and, 135–137, 136
Measuring the human body, 187, 190–192
Median, 112–114
lack of sensitivity to extreme values, 112–113
Mode, 114–115
of binomial discrete random variables, 335–336
lack of, 115
Models
fitting to data, 635
probability, 241–243, 251
regression, 717–721, 718. See also Multiple regression model
Motor vehicle fuel efficiency, 427, 434, 436–437
MSE. See Mean square error (MSE)
MSTR. See Mean square treatment (MSTR)
Multimodal data sets, 115
Multinomial random variable, 632–634
Multiple coefficient of determination (), 745
Multiple comparisons, 685–688, 686
Tukey's test for, 688–691
using Bonferroni method, 686–688
Multiple regression, 743–757
adjusted coefficient of determination and, 745–746