I-
Acceptance sampling, 396
ACT college entrance examination, 73, 380, 604–
Adequate Calcium Today (ACT) study, 545–
Adjusted R2, 631
Aggregation, 145
Alternative hypothesis. See Hypothesis, alternative
American Community Survey (ACS), 201
Analysis of variance (ANOVA)
one-
regression, 583–
two-
Analysis of variance table
one-
regression, 586, 596, 613–
two-
Anonymity, 206
Applet
Central Limit Theorem, 300–
Confidence Interval, 347–
Correlation and Regression, 104, 106, 135
Law of Large Numbers, 220, 251
Mean and Median, 31, 50
Normal Approximation to Binomial, 322
Normal Curve, 62, 72
One-
One-
Probability, 217, 220, 291–
Simple Random Sample, 187, 192, 202, 292
Statistical Power, 400, 401
Statistical Significance, 383
t Statistic, 423
Two-
Association, 80–
and causation, 131, 133, 149–
negative, 89, 96
positive, 89, 96
Attention deficit hyperactivity disorder (ADHD), 429–
Available data, 165, 170
Bar graph, 10, 23, 530
Bayes’s rule, 273–
Behavioral and social science experiments, 209–
Behavioral Risk Factor Surveillance System (BRFSS), 604
Benchmarking, 89
Benford’s law, 226, 249, 338
Bias see also Unbiased estimator
in a sample, 190, 198–
in an experiment, 174–
of a statistic, 287–
Binomial coefficient, 327, 333
Binomial distribution. See Distribution, binomial
Binomial setting, 312, 332, 14–
Block, 183–
Bonferroni procedure, 391, 678–
Bootstrap, 424–
Boston Marathon, 27, 17–
Boxplot, 34, 46
modified, 37, 46
side-
Brown-
Buffon, Count, 218
Canadian Internet Use Survey (CIUS), 14–
Capability, 17–
Capture-
Case, 2, 7, 609
Categorical data. See Variable, categorical
Causation, 131, 133, 148–
Cause-
Cell, 137, 145, 699
Census, 167, 170, 636
Census Bureau, 8, 338, 380
Center of a distribution, 28–
Centers for Disease Control and Prevention, 214, 336, 604
Central limit theorem, 298–
Chi-
Chi-
and the z statistic, 540–
goodness of fit test, 547
Classes in a histogram, 15
Clinical trials, 207
Coefficient of determination, 662. See also Correlation, squared multiple
Coin tossing, 217, 221, 238–
College Alcohol Study (CAS), 604
Column variable. See Variable, row, and Variable, column
Common response, 149–
Complement of an event. See Event, complement
Conditional distribution. See Distribution, conditional
Conditional probability. See Probability, conditional
Confidence interval, 346–
behavior, 352–
bootstrap, 16-
cautions, 355–
for multiple comparisons, 680
for odds ratio, 14–
for slope in a logistic regression, 14–
relation to two-
simultaneous, 680
t for a contrast, 674
t for difference of means, 437–
pooled, 449
t for matched pairs, 421
t for mean response in regression, 570–
t for one mean, 410–
t for regression parameters, 568, 578, 612–
z for one mean, 348–
z for one proportion
large sample, 486, 500
plus four, 489
z for difference of proportions
large sample, 506–
plus four, 509–
Confidence level, 347, 356
Confidentiality, 204, 206–
Confounding, 149–
Consumer Report of Eating Share Trends (CREST), 632, 14–
Consumer Reports National Research Center, 319
Consumers Union, 85, 16–
Continuity correction, 325–
Contrast, 650, 670–
Control chart, 17–
individuals chart, 17–
p chart, 17-
R chart, 17–
s chart, 17-
chart, 17-
Control group, 174, 310
Correlation, 100–
and regression, 115, 118
based on averaged data, 131, 133
between random variables, 257, 261
bootstrap confidence interval, 16-
cautions about, 123–
nonsense, 131
inference for, 593–
population, 594
I-
Correlation (continued)
properties, 102–
squared, 116, 118, 585, 596
squared multiple, 615. See also Coefficient of determination
test for, 594, 596
Count, 9. See also Frequency
distribution of, 310–
Critical value, 378, 379
of chi-
of F distribution, 585–
of standard Normal distribution, 349, 410, Table A
of t distribution, 409–
Current Population Survey, 289
Cumulative proportion, 61, 70
standard Normal, 63–
Data, 2
Anecdotal, 164, 170
Available, 156, 170
Data mining, 132–
Decision analysis, 396–
Degree of Reading Power, 437, 16–
Degrees of freedom, 40
approximation for, 436, 447, 453
of chi-
of chi-
of F distribution, 585
of one-
of t distribution, 409, 425
of two-
of regression ANOVA, 584–
of regression t, 568, 570, 572, 594, 612–
of regression s2, 562, 612
of two-
Deming, W. Edwards, 17–
Density curve, 51–
Density estimation, 68–
Design, 171–
block, 183–
repeated-
sampling, 188–
Direction of a relationship, 88, 96
Disjoint events. See Event, disjoint
Distribution, 23, 46
bimodal, 69
binomial, 312–
formula, 326–
Normal approximation, 321–
use in the sign test, 472–
bootstrap, 16-
of categorical variable, 9
chi-
conditional, 140, 145, 528, 537
describing, 20, 23
examining, 18
exponential, 300
geometric, 340
F, 585–
joint, 138, 145
jointly Normal, 594
marginal, 139, 145
noncentral F, 682
noncentral t, 467, 474
Normal, 56–
for probabilities, 242–
standard, 60, 63, 70, Table A
Poisson, 328–
population, 291, 294
probability. See Probability distribution
of quantitative variable, 11–
sampling. See Sampling distribution
skewed, 18, 23
symmetric, 18, 23
t, 409–
trimodal, 69
tails, 18
uniform, 71, 240, 243, 554
unimodal, 18
Weibull, 305–
Distribution-
Double-
Dual X-
Equivalence testing, 420–
Estimation, 250–
Ethics, 163, 203–
Excel, 3, 178–
Expected value, 248. See also Mean of a random variable
Expected cell count, 533, 543, 547, 550
Experiment, 167–
block design, 183–
cautions about, 181–
comparative, 173–
completely randomized, 180
matched pairs, 182–
principles, 177
units, 171, 185
Explanatory variable. See Variable, explanatory
Exploratory data analysis, 9, 16, 23, 163
Extrapolation, 110, 118
Event, 223, 232
complement of, 224, 232
disjoint, 224, 232
empty, 266
independent, 229, 232
intersection, 271, 274
union, 264, 274
F distribution. See Distribution, F
F test
one-
regression ANOVA, 586, 614
for collection of regression coefficients, 631–
for standard deviations, 665–
two-
Facebook, 23, 308–
Factor, experimental, 172, 185, 644, 698–
Federal Aviation Administration (FAA), 309
Fisher, Sir R. A., 385, 400, 585
Fitting a line, 108–
Five-
Flowchart, 17-
Form of a relationship, 88, 96
Frequency, 15, 23
Frequency table, 15
Gallup-
Gallup Poll, 335–
Genetic counseling, 277
Genomics, 388
General Social Survey (GSS), 167, 197, 211
Goodness of fit, 545–
Google, 9, 485
Gosset, William, 48, 409, 16–
Histogram, 14, 23
Hypothesis
alternative, 363–
one-
two-
null, 363, 379
Hypothesis testing, 399–
Independence, 218–
in two-
of events, 228–
of random variables, 257–
Indicator variable. See Variable, indicator
Inference, statistical. See Statistical inference
Influential observation, 127–
Informed consent, 204, 205–
Institutional review board (IRB), 204–
Instrument, 5
Interaction, 701, 703–
Intercept of a line, 108
of least-
Internet Movie Database (IMDb), 637
Intervention, 169, 170
Intersection of events, See Event, intersection
Interquartile range (IQR), 36, 46
iPod, 422, 470–
I-
Jitter, 87
JMP, 416, 441, 446, 469, 493, 499, 509, 513, 517, 528, 532, 545, 549, 552, 564, 580, 622, 623, 666, 683, 689, 14–
Karaoke Channel, 358
Kerrich, John, 218
Key characteristics of a data set, 4, 7
Key characteristics of data for relationships, 83
Kruskal-
Label, 2, 7
Law of large numbers, 250–
Law School Admission Test (LSAT), 390, 476
Leaf, in a stemplot, 11, 23
Leaning Tower of Pisa, 604
Least significant difference, 678
Least squares, 111, 611–
Least squares regression line, 112, 118, 555, 577
Level of a factor, 172, 185, 698–
Line, equation of, 108
least-
Linear relationship, 88, 96
Linear transformation. See Transformation, linear
Logarithm transformation. See Transformation, logarithm
Logistic regression, 632–
Logit, 14–
Lurking variable. See Variable, lurking
Main effect, 701, 703–
Major League Baseball (MLB), 15–
Mann-
Margin of error, 287, 289, 291, 352
for a difference in two means, 437, 449, 454
for a difference in two proportions, 508, 519
for a single mean, 349, 353, 356–
for a single proportion, 486, 500
Marginal means, 705, 714
Matched pairs design, 182–
inference for, 419–
McNemar’s test, 554
Mean, 28, 46
of binomial distribution, 318, 332
of density curve, 55, 69
of difference of sample means, 434
of difference of sample proportions, 506
of Normal distribution, 56
of random variable, 246–
rules for, 253–
of sample mean, 296–
of sample proportion, 320, 332, 500, 584
trimmed, 51
versus median, 31
Mean square
in one-
in two-
in multiple linear regression, 613
in simple linear regression, 584–
Median, 30, 46
inference for, 472–
of density curve, 55, 69
Mendel, Gregor, 230
Meta-
Minitab, 315, 395, 417, 422, 441, 463, 466, 493, 499, 509, 514, 517, 529, 548, 563, 595, 627, 630, 665, 684, 690, 713, 14–
Minnesota Multiphasic Personality Inventory (MMPI), 5
Mode, 18, 23
Model selection, 629
Modified Levene’s test, 665–
Mosaic plot, 143, 531, 534
Motorola, 17–
Multiple comparisons, 650, 677–
National AIDS Behavioral Surveys, 335
National Assessment of Educational Progress (NAEP), 70–
National Association of Colleges and Employers (NACE), 354, 358, 464
National Center for Education Statistics, 119–
National Collegiate Athletic Association (NCAA), 16–
National Endowment for the Humanities, 432
National Enquirer, 458
National Football League, 601
National Health and Nutrition Examination Survey (NHANES), 372, 434, 704
National Hockey League (NHL), 617–
National Longitudial Survey of Youth (NLSY), 574
National Oceanic and Atmospheric Administration (NOAA), 581
National Public Radio (NPR), 360
National Science Foundation (NSF), 597
Neyman, Jerzy, 399–
Nielsen Company, 294, 411, 428
Noncentrality parameter
for t, 468, 474
for F, 682
Nonparametric procedure, 470, 472–
Nonresponse, 196, 200
Normal distribution. See Distribution, Normal
Normal distribution calculations, 61–
Normal probability plot. See Normal quantile plot
Normal quantile plot, 66–
Normal scores, 66
Null hypothesis. See Hypothesis, null
Observational study, 168
Odds, 633, 14–
Odds ratio, 633, 14–
Outcomes, 171, 185
Out-
Outliers, 19, 23, 15–
1.5 × IQR criterion, 35–
regression, 127–
Parameter, 282, 290
Pareto chart, 17–
Pearson, Egon, 399
Pearson, Karl, 218
Percent, 9
Percentile, 32
Permutation tests, 15–
Pew Research Center survey, 198, 308, 428, 484, 485, 501, 521, 522, 527, 14–
Pie chart, 11, 23
Placebo effect, 174
Plug-
Pooled estimator
of population proportion, 512, 519
of ANOVA variance, 654, 660, 703
of variance in two samples, 448
Population, 189, 200
Population distribution. See Distribution, population
Power, 392, 400
and Type II error, 399
increasing, 395
of one-
of t test
one-
two-
of z test, 391–
of z test for a single proportion, 498–
of z test for comparing two proportions, 516–
Prediction, 107, 110, 118
Prediction interval, 572–
Probability, 216–
conditional, 267–
equally likely outcomes, 227
finite sample space, 225–
Probability distribution, 236, 241
mean of, 246–
standard deviation of, 255–
variance of, 255–
I-
Probability histogram, 237, 243
Probability model, 221, 232
Probability rules, 223–
addition, 224, 232, 264, 266, 275
complement, 224, 232, 264, 275
general, 264–
multiplication, 228–
Probability sample. See Sample, probability
Process capability indices, 17-
Proportion, 9
distribution of, 319–
inference for a single proportion, 483–
inference for comparing two proportions, 505–
population, 283
sample, 283, 319, 484, 500
P-value, 366, 379
Quartiles, 32–
of a density curve, 55, 69
R, 315, 329, 330, 331, 332, 16–
Randomization
consequences of, 177
experimental, 175–
how to, 177–
Random digits, 180–
Random number generator, 375
Random phenomenon, 217, 219
Random variable, 235–
continuous, 239–
discrete, 236, 243
mean of, 248, 261
standard deviation of, 256, 261
variance of, 256, 261
Randomized comparative experiment, 177, 185
Randomized response survey, 279–
Ranks, 15–
Rate, 6
Regression, 107–
and correlation, 115, 118
cautions about, 123–
deviations, 88, 560, 577, 610
interpretation, 113
least-
logistic, 632–
model conditions, 567
model selection, 627–
multiple, 608–
multiple logistic, 632–
nonlinear, 576–
simple linear, 556–
Regression equation, population, 608, 615
Regression line, 107, 117
population, 557, 577
Relative risk, 518, 519
Reliability, 313
Resample, 424. See also Chapter 16
Residual, 123–
plots, 125, 133, 566, 577–
Resistant measure, 30, 46
Response bias, 198, 200
Response rate, 189
Response variable. See Variable, response
Robust, 30, 423–
Roundoff error, 125, 138, 139
Row variable. See Variable, row, and Variable, column
Rugby sevens, 455
Sallie Mae, 350
Sample, 189, 200
cautions about, 196–
design of, 189, 200
multistage,195–
probability, 194, 200
simple random (SRS), 191–
stratified, 193–
systematic, 202
Sample size, choosing
confidence interval for a difference in means, 462–
confidence interval for a difference in proportions, 514–
confidence interval for a mean, 353, 461–
confidence interval for a proportion, 494–
one-
power for a proportion, 498–
power for a difference in proportions, 516–
t test, one-
t test, two-
Sample space, 221, 232
finite, 225
Sample survey, 167–
Sampling distribution, 281, 284–
of difference of means, 434
of regression estimators, 567
of sample count, 314, 322, 332
of sample mean, 298, 307
of sample proportion, 285, 319–
Sampling variability, 287–
SAS, 445, 587, 619, 626, 628, 631, 664, 710, 711
SAT college entrance examination, 73, 344, 604–
Scatterplot, 86, 96
adding categorical variables to, 93
smoothing, 94, 96
Shape of a distribution, 11, 23
Shewhart, Walter, 17–
Sign test, 472–
Significance level, 367, 383–
Significance, statistical, 367–
and Type I error, 398
Significance test, 361–
chi-
relation to z test, 540–
chi-
chi-
F test in one-
F test in regression, 585–
F test for a collection of regression coefficients, 631–
F test for standard deviations, 665–
F tests in two-
Kruskal-
Mann-
relationship to confidence intervals, 375–
sign test for matched pairs, 472–
t test for a contrast, 674
t test for correlation, 594, 596
t test for one mean, 413, 425
t test for matched pairs, 419–
t test for two means, 440, 454
pooled, 449
t test for regression coefficients, 568, 578
t tests for multiple comparisons, 678
use and abuse, 384–
Wilcoxon rank sum test, 15-
Wilcoxon signed rank test, 15-
z test for one mean, 372, 379
z test for one proportion, 491, 500
z test for logistic regression slope, 14–
z test for two proportions, 511–
Simple random sample. See Sample, simple random
Simpson’s paradox, 143, 145, 160
Simulation, 284
Simultaneous confidence intervals, 680
68–
Skewed distribution. See Distribution, skewed
Slope of a line, 108
of least-
Small numbers, law of, 252
Spread of a distribution, 32, 38, 46, 54
Spreadsheet, 3. See also Excel
SPSS, 417, 446, 530, 549, 562, 621, 657, 658, 671, 676, 680, 14–
Standard & Poor’s 500-
I-
Standard deviation, 38, 46. See also Variance
of binomial distribution, 318, 332
of density curve, 55, 69
of deviations in ANOVA, 652, 702
of deviations in regression, 560, 577, 611, 615
of difference between sample means, 434
pooled, 448
of difference between sample proportions, 515, 519
of Normal distribution, 57
of Poisson distribution, 329, 333
of random variable, 256, 261
of regression intercept and slope, 590–
of sample mean, 297, 307
of sample proportion, 485
properties, 40
rules for, 257–
Standard error, 408
bootstrap, 16–
for regression prediction, 592, 596
of a contrast, 674
of a difference in sample means, 436–
pooled, 448
of a difference in sample proportions, 508, 519
of a sample mean, 408, 425
of a sample proportion, 486
of mean regression response, 592, 596
of regression intercept and slope, 591, 596
Standard Normal distribution. See Distribution, standard Normal
Standardized observation, 59, 70
Statistic, 282, 290
Statistical inference, 282, 290, 341–
for non-
for small samples, 444–
Statistical process control, Chapter 17
Statistical significance. See Significance, statistical
Stem-
Stemplot, 11, 23
back-
splitting stems, 13
trimming, 13
Strata, 195, 200. See also Sample, stratified
Strength of a relationship, 88, 96. See also Correlation
StubHub! 69, 16-
Student Monitor, 283, 291
Subjects, experimental, 171, 185
Subpopulation, 557, 608
Sums of squares
in one-
in two-
in multiple linear regression, 613–
in simple linear regression, 583–
Survey of Study Habits and Attitudes (SSHA), 382
Systematically larger, 15–
Symmetic distribution. See Distribution, symmetric
t distribution. See Distribution, t
t inference procedures
for contrasts, 674
for correlation, 594, 586
for matched pairs, 419–
for multiple comparisons, 678
for one mean, 411, 413
for regression coefficients, 568, 578, 612–
for regression mean response, 570, 578
for regression prediction, 572, 578
for two means, 437, 440
for two means, pooled, 449
robustness of, 423–
Tails of a distribution. See Distribution, tails
Test of significance. See Significance test
Test statistic, 364–
Testing hypotheses. See Significance test
The Times Higher Education Supplement, 637–
Three-
Ties, 15–
Time plot, 21, 23
Titanic, 24, 52, 146, 161, 16–
Transformation
linear, 44–
logarithm, 91, 96, 470–
rank, 15–
square root, 671–
Treatment, experimental, 171, 174, 185
Tree diagram, 271–
Tuskegee study, 208
Twitter, 75, 244, 522
Two-
Two-
data analysis for, 136–
inference for, 525–
models for, 543
relationships in, 81, 528
Type I and II errors, 396–
Uber, 428
Unbiased estimator, 287
Undercoverage, 196, 200
Unimodal distribution. See Distribution, unimodal
Union of events, 264–
Unit of measurement, 3, 43
Unit, experimental, 171, 185
U.S. Agency for International Development, 15–
U.S. Department of Education, 336
Value of a variable, 2, 7
Variability, 32, 287–
Variable, 2, 7
categorical, 3, 7, 10, 11
column, 137, 145
dependent, 83
explanatory, 82, 84
independent, 83
indicator 14-
lurking, 129–
quantitative, 3, 7, 11
response, 82, 84
row, 137, 145
Variance, 38, 46
of a difference between two sample means, 434
pooled, 448
of a difference between two sample proportions, 507, 519
of a random variable, 255–
a pooled estimator, 448, 454
rules for, 257–
of a sample mean, 297
Variation
among groups, 658, 667
between groups, 647, 658, 667
common cause, 17–
special cause, 17–
within group, 647, 658, 667
Venn diagram, 224
Voluntary response, 190–
Wald statistic, 14–
Wall Street Journal, 458
Whiskers, 35
Wilcoxon rank sum test, 15-
Wilcoxon signed rank test, 15-
Wording questions, 198, 200
World Bank, 28
World Database of Happiness, 638
z-score, 59, 70
z statistic
for one proportion, 491, 500
for two proportions, 512, 519
one-
two-
pooled, 449