I-
Index
Acceptance sampling, 396
ACT college entrance examination, 73, 380, 604–605
Adequate Calcium Today (ACT) study, 545–546
Adjusted R2, 631
Aggregation, 145
Alternative hypothesis. See Hypothesis, alternative
American Community Survey (ACS), 201
Analysis of variance (ANOVA)
one-
regression, 583–587, 596, 613–614, 616
two-
Analysis of variance table
one-
regression, 586, 596, 613–614, 616
two-
Anonymity, 206
Applet
Central Limit Theorem, 300–302, 311
Confidence Interval, 347–348, 360, 403
Correlation and Regression, 104, 106, 135
Law of Large Numbers, 220, 251
Mean and Median, 31, 50
Normal Approximation to Binomial, 322
Normal Curve, 62, 72
One-
One-
Probability, 217, 220, 291–292, 335
Simple Random Sample, 187, 192, 202, 292
Statistical Power, 400, 401
Statistical Significance, 383
t Statistic, 423
Two-
Association, 80–81, 84, 532–533
and causation, 131, 133, 149–150
negative, 89, 96
positive, 89, 96
Attention deficit hyperactivity disorder (ADHD), 429–430
Available data, 165, 170
Bar graph, 10, 23, 530
Bayes’s rule, 273–274
Behavioral and social science experiments, 209–211
Behavioral Risk Factor Surveillance System (BRFSS), 604
Benchmarking, 89
Benford’s law, 226, 249, 338
Bias see also Unbiased estimator
in a sample, 190, 198–201
in an experiment, 174–175, 185
of a statistic, 287–289, 290
Binomial coefficient, 327, 333
Binomial distribution. See Distribution, binomial
Binomial setting, 312, 332, 14–1
Block, 183–184, 185
Bonferroni procedure, 391, 678–680
Bootstrap, 424–425, 15–2. See also Chapter 16
Boston Marathon, 27, 17–40
Boxplot, 34, 46
modified, 37, 46
side-
Brown-
Buffon, Count, 218
Canadian Internet Use Survey (CIUS), 14–24
Capability, 17–34, 17–36
Capture-
Case, 2, 7, 609
Categorical data. See Variable, categorical
Causation, 131, 133, 148–152
Cause-
Cell, 137, 145, 699
Census, 167, 170, 636
Census Bureau, 8, 338, 380
Center of a distribution, 28–31, 46, 54
Centers for Disease Control and Prevention, 214, 336, 604
Central limit theorem, 298–301, 313, 325, 328, 335
Chi-
Chi-
and the z statistic, 540–541
goodness of fit test, 547
Classes in a histogram, 15
Clinical trials, 207
Coefficient of determination, 662. See also Correlation, squared multiple
Coin tossing, 217, 221, 238–239, 291, 312–313, 331, 335, 339
College Alcohol Study (CAS), 604
Column variable. See Variable, row, and Variable, column
Common response, 149–150, 152
Complement of an event. See Event, complement
Conditional distribution. See Distribution, conditional
Conditional probability. See Probability, conditional
Confidence interval, 346–348, 356
behavior, 352–353
bootstrap, 16-13–16–16, 16-31–16–35
cautions, 355–356
for multiple comparisons, 680
for odds ratio, 14–10, 14–19
for slope in a logistic regression, 14–9, 14–18
relation to two-
simultaneous, 680
t for a contrast, 674
t for difference of means, 437–439, 454
pooled, 449
t for matched pairs, 421
t for mean response in regression, 570–571, 578
t for one mean, 410–412, 425–426
t for regression parameters, 568, 578, 612–613, 616
z for one mean, 348–352
z for one proportion
large sample, 486, 500
plus four, 489
z for difference of proportions
large sample, 506–507, 519
plus four, 509–510
Confidence level, 347, 356
Confidentiality, 204, 206–207, 211
Confounding, 149–150, 152, 169, 170, 388, 419
Consumer Report of Eating Share Trends (CREST), 632, 14–23
Consumer Reports National Research Center, 319
Consumers Union, 85, 16–35
Continuity correction, 325–326, 333, 15–7
Contrast, 650, 670–677, 685
Control chart, 17–7, 17–17
individuals chart, 17–40
p chart, 17-51–17–56
R chart, 17–23, 17–35
s chart, 17-12–17–17
chart, 17-8–17–12, 17–14, 17–18
Control group, 174, 310
Correlation, 100–104
and regression, 115, 118
based on averaged data, 131, 133
between random variables, 257, 261
bootstrap confidence interval, 16-35–16–37
cautions about, 123–133
nonsense, 131
inference for, 593–596
population, 594
I-
Correlation (continued)
properties, 102–103, 104
squared, 116, 118, 585, 596
squared multiple, 615. See also Coefficient of determination
test for, 594, 596
Count, 9. See also Frequency
distribution of, 310–314, 321–322, 328–333
Critical value, 378, 379
of chi-
of F distribution, 585–586, Table E
of standard Normal distribution, 349, 410, Table A
of t distribution, 409–410, Table D
Current Population Survey, 289
Cumulative proportion, 61, 70
standard Normal, 63–64, Table A
Data, 2
Anecdotal, 164, 170
Available, 156, 170
Data mining, 132–133
Decision analysis, 396–400
Degree of Reading Power, 437, 16–42
Degrees of freedom, 40
approximation for, 436, 447, 453
of chi-
of chi-
of F distribution, 585
of one-
of t distribution, 409, 425
of two-
of regression ANOVA, 584–586, 613–614
of regression t, 568, 570, 572, 594, 612–613
of regression s2, 562, 612
of two-
Deming, W. Edwards, 17–40
Density curve, 51–54, 69, 240, 243
Density estimation, 68–69
Design, 171–185. See also Experiment
block, 183–184, 185
repeated-
sampling, 188–200
Direction of a relationship, 88, 96
Disjoint events. See Event, disjoint
Distribution, 23, 46
bimodal, 69
binomial, 312–318, 332, 14–2, Table C
formula, 326–328, 333
Normal approximation, 321–324, 332
use in the sign test, 472–473
bootstrap, 16-24–16–29
of categorical variable, 9
chi-
conditional, 140, 145, 528, 537
describing, 20, 23
examining, 18
exponential, 300
geometric, 340
F, 585–586, Table E
joint, 138, 145
jointly Normal, 594
marginal, 139, 145
noncentral F, 682
noncentral t, 467, 474
Normal, 56–57, 69
for probabilities, 242–243
standard, 60, 63, 70, Table A
Poisson, 328–332, 333, 551
population, 291, 294
probability. See Probability distribution
of quantitative variable, 11–16
sampling. See Sampling distribution
skewed, 18, 23
symmetric, 18, 23
t, 409–410, Table D
trimodal, 69
tails, 18
uniform, 71, 240, 243, 554
unimodal, 18
Weibull, 305–307
Distribution-
Double-
Dual X-
Equivalence testing, 420–422
Estimation, 250–251
Ethics, 163, 203–211
Excel, 3, 178–179, 191–192, 417, 445, 487, 508, 563, 609, 629, 664, 713, 17–22
Expected value, 248. See also Mean of a random variable
Expected cell count, 533, 543, 547, 550
Experiment, 167–168, 170
block design, 183–184, 185
cautions about, 181–182
comparative, 173–174, 185
completely randomized, 180
matched pairs, 182–183, 185
principles, 177
units, 171, 185
Explanatory variable. See Variable, explanatory
Exploratory data analysis, 9, 16, 23, 163
Extrapolation, 110, 118
Event, 223, 232
complement of, 224, 232
disjoint, 224, 232
empty, 266
independent, 229, 232
intersection, 271, 274
union, 264, 274
F distribution. See Distribution, F
F test
one-
regression ANOVA, 586, 614
for collection of regression coefficients, 631–632, 635
for standard deviations, 665–666
two-
Facebook, 23, 308–309, 428, 456, 522, 648–650, 661–663, 670–673, 686, 687–688, 694, 696, 15–31, 16-3–16–5
Factor, experimental, 172, 185, 644, 698–702
Federal Aviation Administration (FAA), 309
Fisher, Sir R. A., 385, 400, 585
Fitting a line, 108–109
Five-
Flowchart, 17-4–17–5
Form of a relationship, 88, 96
Frequency, 15, 23
Frequency table, 15
Gallup-
Gallup Poll, 335–336
Genetic counseling, 277
Genomics, 388
General Social Survey (GSS), 167, 197, 211
Goodness of fit, 545–550
Google, 9, 485
Gosset, William, 48, 409, 16–10
Histogram, 14, 23
Hypothesis
alternative, 363–364, 370, 379
one-
two-
null, 363, 379
Hypothesis testing, 399–400. See also Significance test
Independence, 218–219
in two-
of events, 228–229, 232
of random variables, 257–258, 261, 274
Indicator variable. See Variable, indicator
Inference, statistical. See Statistical inference
Influential observation, 127–129, 133, 566, 624
Informed consent, 204, 205–206, 211
Institutional review board (IRB), 204–205, 211
Instrument, 5
Interaction, 701, 703–707
Intercept of a line, 108
of least-
Internet Movie Database (IMDb), 637
Intervention, 169, 170
Intersection of events, See Event, intersection
Interquartile range (IQR), 36, 46
iPod, 422, 470–471
I-
Jitter, 87
JMP, 416, 441, 446, 469, 493, 499, 509, 513, 517, 528, 532, 545, 549, 552, 564, 580, 622, 623, 666, 683, 689, 14–4, 14–14, 15–8, 15–12, 15–15, 15–20, 15–30, 15–31
Karaoke Channel, 358
Kerrich, John, 218
Key characteristics of a data set, 4, 7
Key characteristics of data for relationships, 83
Kruskal-
Label, 2, 7
Law of large numbers, 250–252, 253, 261
Law School Admission Test (LSAT), 390, 476
Leaf, in a stemplot, 11, 23
Leaning Tower of Pisa, 604
Least significant difference, 678
Least squares, 111, 611–612
Least squares regression line, 112, 118, 555, 577
Level of a factor, 172, 185, 698–701
Line, equation of, 108
least-
Linear relationship, 88, 96
Linear transformation. See Transformation, linear
Logarithm transformation. See Transformation, logarithm
Logistic regression, 632–633. See also Chapter 14
Logit, 14–5
Lurking variable. See Variable, lurking
Main effect, 701, 703–707, 714
Major League Baseball (MLB), 15–3
Mann-
Margin of error, 287, 289, 291, 352
for a difference in two means, 437, 449, 454
for a difference in two proportions, 508, 519
for a single mean, 349, 353, 356–357, 411, 426
for a single proportion, 486, 500
Marginal means, 705, 714
Matched pairs design, 182–183, 185
inference for, 419–420, 426, 472–473, 15–17
McNemar’s test, 554
Mean, 28, 46
of binomial distribution, 318, 332
of density curve, 55, 69
of difference of sample means, 434
of difference of sample proportions, 506
of Normal distribution, 56
of random variable, 246–248, 261
rules for, 253–254, 261
of sample mean, 296–297, 307
of sample proportion, 320, 332, 500, 584
trimmed, 51
versus median, 31
Mean square
in one-
in two-
in multiple linear regression, 613
in simple linear regression, 584–586
Median, 30, 46
inference for, 472–473, 15–9, 15–23, 16-28–16–29
of density curve, 55, 69
Mendel, Gregor, 230
Meta-
Minitab, 315, 395, 417, 422, 441, 463, 466, 493, 499, 509, 514, 517, 529, 548, 563, 595, 627, 630, 665, 684, 690, 713, 14–11, 14–14, 14–16, 14–17, 14–20, 14–21, 15–8, 15–20, 15–24
Minnesota Multiphasic Personality Inventory (MMPI), 5
Mode, 18, 23
Model selection, 629
Modified Levene’s test, 665–666
Mosaic plot, 143, 531, 534
Motorola, 17–2
Multiple comparisons, 650, 677–681, 15–51
National AIDS Behavioral Surveys, 335
National Assessment of Educational Progress (NAEP), 70–71, 381
National Association of Colleges and Employers (NACE), 354, 358, 464
National Center for Education Statistics, 119–120, 166
National Collegiate Athletic Association (NCAA), 16–11
National Endowment for the Humanities, 432
National Enquirer, 458
National Football League, 601
National Health and Nutrition Examination Survey (NHANES), 372, 434, 704
National Hockey League (NHL), 617–618
National Longitudial Survey of Youth (NLSY), 574
National Oceanic and Atmospheric Administration (NOAA), 581
National Public Radio (NPR), 360
National Science Foundation (NSF), 597
Neyman, Jerzy, 399–400
Nielsen Company, 294, 411, 428
Noncentrality parameter
for t, 468, 474
for F, 682
Nonparametric procedure, 470, 472–473. See also Chapter 15
Nonresponse, 196, 200
Normal distribution. See Distribution, Normal
Normal distribution calculations, 61–66
Normal probability plot. See Normal quantile plot
Normal quantile plot, 66–67, 70
Normal scores, 66
Null hypothesis. See Hypothesis, null
Observational study, 168
Odds, 633, 14–2, 14–18
Odds ratio, 633, 14–7, 14–10, 14–18
Outcomes, 171, 185
Out-
Outliers, 19, 23, 15–1
1.5 × IQR criterion, 35–36
regression, 127–129, 133, 574–575
Parameter, 282, 290
Pareto chart, 17–18, 17-53–17–54, 17–57
Pearson, Egon, 399
Pearson, Karl, 218
Percent, 9
Percentile, 32
Permutation tests, 15–2, 16-41–16–50
Pew Research Center survey, 198, 308, 428, 484, 485, 501, 521, 522, 527, 14–19, 15–15, 16–55
Pie chart, 11, 23
Placebo effect, 174
Plug-
Pooled estimator
of population proportion, 512, 519
of ANOVA variance, 654, 660, 703
of variance in two samples, 448
Population, 189, 200
Population distribution. See Distribution, population
Power, 392, 400
and Type II error, 399
increasing, 395
of one-
of t test
one-
two-
of z test, 391–395
of z test for a single proportion, 498–499
of z test for comparing two proportions, 516–517
Prediction, 107, 110, 118
Prediction interval, 572–573, 578, 613
Probability, 216–217, 219
conditional, 267–268, 269
equally likely outcomes, 227
finite sample space, 225–226
Probability distribution, 236, 241
mean of, 246–248, 261
standard deviation of, 255–256, 261
variance of, 255–256, 261
I-
Probability histogram, 237, 243
Probability model, 221, 232
Probability rules, 223–224, 232
addition, 224, 232, 264, 266, 275
complement, 224, 232, 264, 275
general, 264–275
multiplication, 228–229, 232, 264–265, 268, 275
Probability sample. See Sample, probability
Process capability indices, 17-40–17–47
Proportion, 9
distribution of, 319–321, 322–323
inference for a single proportion, 483–501
inference for comparing two proportions, 505–517
population, 283
sample, 283, 319, 484, 500
P-value, 366, 379
Quartiles, 32–33, 46
of a density curve, 55, 69
R, 315, 329, 330, 331, 332, 16–9, 16–11, 16–14, 16–18, 16–34, 16–38, 16–45
Randomization
consequences of, 177
experimental, 175–176, 185
how to, 177–180
Random digits, 180–181, 192–193, 200, 284, Table B
Random number generator, 375
Random phenomenon, 217, 219
Random variable, 235–236 ,243
continuous, 239–242, 243
discrete, 236, 243
mean of, 248, 261
standard deviation of, 256, 261
variance of, 256, 261
Randomized comparative experiment, 177, 185
Randomized response survey, 279–280
Ranks, 15–4, 15–14
Rate, 6
Regression, 107–117
and correlation, 115, 118
cautions about, 123–133
deviations, 88, 560, 577, 610
interpretation, 113
least-
logistic, 632–633, Chapter 14
model conditions, 567
model selection, 627–631
multiple, 608–615
multiple logistic, 632–633, 14-16–14–18
nonlinear, 576–577
simple linear, 556–576
Regression equation, population, 608, 615
Regression line, 107, 117
population, 557, 577
Relative risk, 518, 519
Reliability, 313
Resample, 424. See also Chapter 16
Residual, 123–124, 133, 561, 577, 612, 616, 653
plots, 125, 133, 566, 577–578, 599, 690
Resistant measure, 30, 46
Response bias, 198, 200
Response rate, 189
Response variable. See Variable, response
Robust, 30, 423–424, 426, 442, 15–1
Roundoff error, 125, 138, 139
Row variable. See Variable, row, and Variable, column
Rugby sevens, 455
Sallie Mae, 350
Sample, 189, 200
cautions about, 196–199, 200
design of, 189, 200
multistage,195–196, 200
probability, 194, 200
simple random (SRS), 191–193, 200
stratified, 193–194, 200
systematic, 202
Sample size, choosing
confidence interval for a difference in means, 462–463
confidence interval for a difference in proportions, 514–515, 519
confidence interval for a mean, 353, 461–463
confidence interval for a proportion, 494–495, 500
one-
power for a proportion, 498–499
power for a difference in proportions, 516–517
t test, one-
t test, two-
Sample space, 221, 232
finite, 225
Sample survey, 167–168, 170, 188–200
Sampling distribution, 281, 284–287, 290
of difference of means, 434
of regression estimators, 567
of sample count, 314, 322, 332
of sample mean, 298, 307
of sample proportion, 285, 319–321, 322, 332
Sampling variability, 287–288
SAS, 445, 587, 619, 626, 628, 631, 664, 710, 711
SAT college entrance examination, 73, 344, 604–605, 618–619, 14–26
Scatterplot, 86, 96
adding categorical variables to, 93
smoothing, 94, 96
Shape of a distribution, 11, 23
Shewhart, Walter, 17–7, 17–32
Sign test, 472–473, 491–492, 549–550
Significance level, 367, 383–385
Significance, statistical, 367–369, 379
and Type I error, 398
Significance test, 361–370
chi-
relation to z test, 540–541
chi-
chi-
F test in one-
F test in regression, 585–586, 596, 614
F test for a collection of regression coefficients, 631–632, 635
F test for standard deviations, 665–666
F tests in two-
Kruskal-
Mann-
relationship to confidence intervals, 375–377
sign test for matched pairs, 472–473, 491–492
t test for a contrast, 674
t test for correlation, 594, 596
t test for one mean, 413, 425
t test for matched pairs, 419–420
t test for two means, 440, 454
pooled, 449
t test for regression coefficients, 568, 578
t tests for multiple comparisons, 678
use and abuse, 384–389
Wilcoxon rank sum test, 15-3–15–14
Wilcoxon signed rank test, 15-17–15–24
z test for one mean, 372, 379
z test for one proportion, 491, 500
z test for logistic regression slope, 14–10, 14–19
z test for two proportions, 511–512, 519
Simple random sample. See Sample, simple random
Simpson’s paradox, 143, 145, 160
Simulation, 284
Simultaneous confidence intervals, 680
68–95–99.7 rule, 57–58, 70
Skewed distribution. See Distribution, skewed
Slope of a line, 108
of least-
Small numbers, law of, 252
Spread of a distribution, 32, 38, 46, 54
Spreadsheet, 3. See also Excel
SPSS, 417, 446, 530, 549, 562, 621, 657, 658, 671, 676, 680, 14–14, 14–17, 15–9, 15–20
Standard & Poor’s 500-
I-
Standard deviation, 38, 46. See also Variance
of binomial distribution, 318, 332
of density curve, 55, 69
of deviations in ANOVA, 652, 702
of deviations in regression, 560, 577, 611, 615
of difference between sample means, 434
pooled, 448
of difference between sample proportions, 515, 519
of Normal distribution, 57
of Poisson distribution, 329, 333
of random variable, 256, 261
of regression intercept and slope, 590–591
of sample mean, 297, 307
of sample proportion, 485
properties, 40
rules for, 257–258, 261
Standard error, 408
bootstrap, 16–5, 16–8
for regression prediction, 592, 596
of a contrast, 674
of a difference in sample means, 436–437
pooled, 448
of a difference in sample proportions, 508, 519
of a sample mean, 408, 425
of a sample proportion, 486
of mean regression response, 592, 596
of regression intercept and slope, 591, 596
Standard Normal distribution. See Distribution, standard Normal
Standardized observation, 59, 70
Statistic, 282, 290
Statistical inference, 282, 290, 341–343
for non-
for small samples, 444–447
Statistical process control, Chapter 17
Statistical significance. See Significance, statistical
Stem-
Stemplot, 11, 23
back-
splitting stems, 13
trimming, 13
Strata, 195, 200. See also Sample, stratified
Strength of a relationship, 88, 96. See also Correlation
StubHub! 69, 16-11–16–12, 16–22
Student Monitor, 283, 291
Subjects, experimental, 171, 185
Subpopulation, 557, 608
Sums of squares
in one-
in two-
in multiple linear regression, 613–614
in simple linear regression, 583–584
Survey of Study Habits and Attitudes (SSHA), 382
Systematically larger, 15–9
Symmetic distribution. See Distribution, symmetric
t distribution. See Distribution, t
t inference procedures
for contrasts, 674
for correlation, 594, 586
for matched pairs, 419–420
for multiple comparisons, 678
for one mean, 411, 413
for regression coefficients, 568, 578, 612–613, 616
for regression mean response, 570, 578
for regression prediction, 572, 578
for two means, 437, 440
for two means, pooled, 449
robustness of, 423–424, 442–443
Tails of a distribution. See Distribution, tails
Test of significance. See Significance test
Test statistic, 364–365
Testing hypotheses. See Significance test
The Times Higher Education Supplement, 637–638
Three-
Ties, 15–10, 15–22
Time plot, 21, 23
Titanic, 24, 52, 146, 161, 16–12, 16–22
Transformation
linear, 44–45, 46, 254
logarithm, 91, 96, 470–471, 574–575
rank, 15–4
square root, 671–672
Treatment, experimental, 171, 174, 185
Tree diagram, 271–273, 275
Tuskegee study, 208
Twitter, 75, 244, 522
Two-
Two-
data analysis for, 136–145
inference for, 525–543
models for, 543
relationships in, 81, 528
Type I and II errors, 396–397
Uber, 428
Unbiased estimator, 287
Undercoverage, 196, 200
Unimodal distribution. See Distribution, unimodal
Union of events, 264–265
Unit of measurement, 3, 43
Unit, experimental, 171, 185
U.S. Agency for International Development, 15–26
U.S. Department of Education, 336
Value of a variable, 2, 7
Variability, 32, 287–288
Variable, 2, 7
categorical, 3, 7, 10, 11
column, 137, 145
dependent, 83
explanatory, 82, 84
independent, 83
indicator 14-
lurking, 129–130, 133, 172
quantitative, 3, 7, 11
response, 82, 84
row, 137, 145
Variance, 38, 46
of a difference between two sample means, 434
pooled, 448
of a difference between two sample proportions, 507, 519
of a random variable, 255–256, 261
a pooled estimator, 448, 454
rules for, 257–258, 261
of a sample mean, 297
Variation
among groups, 658, 667
between groups, 647, 658, 667
common cause, 17–7
special cause, 17–7
within group, 647, 658, 667
Venn diagram, 224
Voluntary response, 190–191
Wald statistic, 14–10, 14–19
Wall Street Journal, 458
Whiskers, 35
Wilcoxon rank sum test, 15-3–15–16
Wilcoxon signed rank test, 15-17–15–24
Wording questions, 198, 200
World Bank, 28
World Database of Happiness, 638
z-score, 59, 70
z statistic
for one proportion, 491, 500
for two proportions, 512, 519
one-
two-
pooled, 449