Index

I-1

Index

Acceptance sampling, 396

ACT college entrance examination, 73, 380, 604605

Adequate Calcium Today (ACT) study, 545546

Adjusted R2, 631

Aggregation, 145

Alternative hypothesis. See Hypothesis, alternative

American Community Survey (ACS), 201

Analysis of variance (ANOVA)

one-way, 643685

regression, 583587, 596, 613614, 616

two-way, 697713

Analysis of variance table

one-way, 658663

regression, 586, 596, 613614, 616

two-way, 708709

Anonymity, 206

Applet

Central Limit Theorem, 300302, 311

Confidence Interval, 347348, 360, 403

Correlation and Regression, 104, 106, 135

Law of Large Numbers, 220, 251

Mean and Median, 31, 50

Normal Approximation to Binomial, 322

Normal Curve, 62, 72

One-variable statistical calculator, 16

One-Way ANOVA, 661, 687

Probability, 217, 220, 291292, 335

Simple Random Sample, 187, 192, 202, 292

Statistical Power, 400, 401

Statistical Significance, 383

t Statistic, 423

Two-variable statistical calculator, 123

Association, 8081, 84, 532533

and causation, 131, 133, 149150

negative, 89, 96

positive, 89, 96

Attention deficit hyperactivity disorder (ADHD), 429430

Available data, 165, 170

Bar graph, 10, 23, 530

Bayes’s rule, 273274

Behavioral and social science experiments, 209211

Behavioral Risk Factor Surveillance System (BRFSS), 604

Benchmarking, 89

Benford’s law, 226, 249, 338

Bias see also Unbiased estimator

in a sample, 190, 198201

in an experiment, 174175, 185

of a statistic, 287289, 290

Binomial coefficient, 327, 333

Binomial distribution. See Distribution, binomial

Binomial setting, 312, 332, 141

Block, 183184, 185

Bonferroni procedure, 391, 678680

Bootstrap, 424425, 152. See also Chapter 16

Boston Marathon, 27, 1740

Boxplot, 34, 46

modified, 37, 46

side-by-side, 37, 46, 646, 649

Brown-Forsythe test. See Modified Levene’s test

Buffon, Count, 218

Canadian Internet Use Survey (CIUS), 1424

Capability, 1734, 1736

Capture-recapture sampling, 199

Case, 2, 7, 609

Categorical data. See Variable, categorical

Causation, 131, 133, 148152

Cause-and-effect diagram, 174

Cell, 137, 145, 699

Census, 167, 170, 636

Census Bureau, 8, 338, 380

Center of a distribution, 2831, 46, 54

Centers for Disease Control and Prevention, 214, 336, 604

Central limit theorem, 298301, 313, 325, 328, 335

Chi-square distribution. See Distribution, chi-square

Chi-square statistic, 534, 543

and the z statistic, 540541

goodness of fit test, 547

Classes in a histogram, 15

Clinical trials, 207

Coefficient of determination, 662. See also Correlation, squared multiple

Coin tossing, 217, 221, 238239, 291, 312313, 331, 335, 339

College Alcohol Study (CAS), 604

Column variable. See Variable, row, and Variable, column

Common response, 149150, 152

Complement of an event. See Event, complement

Conditional distribution. See Distribution, conditional

Conditional probability. See Probability, conditional

Confidence interval, 346348, 356

behavior, 352353

bootstrap, 16-131616, 16-311635

cautions, 355356

for multiple comparisons, 680

for odds ratio, 1410, 1419

for slope in a logistic regression, 149, 1418

relation to two-sided tests, 375377

simultaneous, 680

t for a contrast, 674

t for difference of means, 437439, 454

pooled, 449

t for matched pairs, 421

t for mean response in regression, 570571, 578

t for one mean, 410412, 425426

t for regression parameters, 568, 578, 612613, 616

z for one mean, 348352

z for one proportion

large sample, 486, 500

plus four, 489

z for difference of proportions

large sample, 506507, 519

plus four, 509510

Confidence level, 347, 356

Confidentiality, 204, 206207, 211

Confounding, 149150, 152, 169, 170, 388, 419

Consumer Report of Eating Share Trends (CREST), 632, 1423

Consumer Reports National Research Center, 319

Consumers Union, 85, 1635

Continuity correction, 325326, 333, 157

Contrast, 650, 670677, 685

Control chart, 177, 1717

individuals chart, 1740

p chart, 17-511756

R chart, 1723, 1735

s chart, 17-121717

chart, 17-81712, 1714, 1718

Control group, 174, 310

Correlation, 100104

and regression, 115, 118

based on averaged data, 131, 133

between random variables, 257, 261

bootstrap confidence interval, 16-351637

cautions about, 123133

nonsense, 131

inference for, 593596

population, 594

I-2

Correlation (continued)

properties, 102103, 104

squared, 116, 118, 585, 596

squared multiple, 615. See also Coefficient of determination

test for, 594, 596

Count, 9. See also Frequency

distribution of, 310314, 321322, 328333

Critical value, 378, 379

of chi-square distribution, 535, Table F

of F distribution, 585586, Table E

of standard Normal distribution, 349, 410, Table A

of t distribution, 409410, Table D

Current Population Survey, 289

Cumulative proportion, 61, 70

standard Normal, 6364, Table A

Data, 2

Anecdotal, 164, 170

Available, 156, 170

Data mining, 132133

Decision analysis, 396400

Degree of Reading Power, 437, 1642

Degrees of freedom, 40

approximation for, 436, 447, 453

of chi-square distribution, 535

of chi-square test, 535

of F distribution, 585

of one-way ANOVA, 659

of t distribution, 409, 425

of two-way ANOVA, 704, 708709

of regression ANOVA, 584586, 613614

of regression t, 568, 570, 572, 594, 612613

of regression s2, 562, 612

of two-sample t, 436, 447, 449

Deming, W. Edwards, 1740

Density curve, 5154, 69, 240, 243

Density estimation, 6869

Design, 171185. See also Experiment

block, 183184, 185

repeated-measures, 707

sampling, 188200

Direction of a relationship, 88, 96

Disjoint events. See Event, disjoint

Distribution, 23, 46

bimodal, 69

binomial, 312318, 332, 142, Table C

formula, 326328, 333

Normal approximation, 321324, 332

use in the sign test, 472473

bootstrap, 16-241629

of categorical variable, 9

chi-square, 535, Table F

conditional, 140, 145, 528, 537

describing, 20, 23

examining, 18

exponential, 300

geometric, 340

F, 585586, Table E

joint, 138, 145

jointly Normal, 594

marginal, 139, 145

noncentral F, 682

noncentral t, 467, 474

Normal, 5657, 69

for probabilities, 242243

standard, 60, 63, 70, Table A

Poisson, 328332, 333, 551

population, 291, 294

probability. See Probability distribution

of quantitative variable, 1116

sampling. See Sampling distribution

skewed, 18, 23

symmetric, 18, 23

t, 409410, Table D

trimodal, 69

tails, 18

uniform, 71, 240, 243, 554

unimodal, 18

Weibull, 305307

Distribution-free procedure, 470. See also Chapter 15

Double-blind experiment, 181182, 185

Dual X-ray absorptiometry scanner, 364, 431432, 475, 17-381739

Equivalence testing, 420422

Estimation, 250251

Ethics, 163, 203211

Excel, 3, 178179, 191192, 417, 445, 487, 508, 563, 609, 629, 664, 713, 1722

Expected value, 248. See also Mean of a random variable

Expected cell count, 533, 543, 547, 550

Experiment, 167168, 170

block design, 183184, 185

cautions about, 181182

comparative, 173174, 185

completely randomized, 180

matched pairs, 182183, 185

principles, 177

units, 171, 185

Explanatory variable. See Variable, explanatory

Exploratory data analysis, 9, 16, 23, 163

Extrapolation, 110, 118

Event, 223, 232

complement of, 224, 232

disjoint, 224, 232

empty, 266

independent, 229, 232

intersection, 271, 274

union, 264, 274

F distribution. See Distribution, F

F test

one-way ANOVA, 661, 667

regression ANOVA, 586, 614

for collection of regression coefficients, 631632, 635

for standard deviations, 665666

two-way ANOVA, 709

Facebook, 23, 308309, 428, 456, 522, 648650, 661663, 670673, 686, 687688, 694, 696, 1531, 16-3165

Factor, experimental, 172, 185, 644, 698702

Federal Aviation Administration (FAA), 309

Fisher, Sir R. A., 385, 400, 585

Fitting a line, 108109

Five-number summary, 34, 46

Flowchart, 17-4175

Form of a relationship, 88, 96

Frequency, 15, 23

Frequency table, 15

Gallup-Healthways Well-Being Index, 359360

Gallup Poll, 335336

Genetic counseling, 277

Genomics, 388

General Social Survey (GSS), 167, 197, 211

Goodness of fit, 545550

Google, 9, 485

Gosset, William, 48, 409, 1610

Histogram, 14, 23

Hypothesis

alternative, 363364, 370, 379

one-sided, 364, 379

two-sided, 364, 379

null, 363, 379

Hypothesis testing, 399400. See also Significance test

Independence, 218219

in two-way tables, 532533, 543544

of events, 228229, 232

of random variables, 257258, 261, 274

Indicator variable. See Variable, indicator

Inference, statistical. See Statistical inference

Influential observation, 127129, 133, 566, 624

Informed consent, 204, 205206, 211

Institutional review board (IRB), 204205, 211

Instrument, 5

Interaction, 701, 703707

Intercept of a line, 108

of least-squares line, 112, 118, 557

Internet Movie Database (IMDb), 637

Intervention, 169, 170

Intersection of events, See Event, intersection

Interquartile range (IQR), 36, 46

iPod, 422, 470471

I-3

Jitter, 87

JMP, 416, 441, 446, 469, 493, 499, 509, 513, 517, 528, 532, 545, 549, 552, 564, 580, 622, 623, 666, 683, 689, 144, 1414, 158, 1512, 1515, 1520, 1530, 1531

Karaoke Channel, 358

Kerrich, John, 218

Key characteristics of a data set, 4, 7

Key characteristics of data for relationships, 83

Kruskal-Wallis test, 15-261531

Label, 2, 7

Law of large numbers, 250252, 253, 261

Law School Admission Test (LSAT), 390, 476

Leaf, in a stemplot, 11, 23

Leaning Tower of Pisa, 604

Least significant difference, 678

Least squares, 111, 611612

Least squares regression line, 112, 118, 555, 577

Level of a factor, 172, 185, 698701

Line, equation of, 108

least-squares, 112, 118, 611612

Linear relationship, 88, 96

Linear transformation. See Transformation, linear

Logarithm transformation. See Transformation, logarithm

Logistic regression, 632633. See also Chapter 14

Logit, 145

Lurking variable. See Variable, lurking

Main effect, 701, 703707, 714

Major League Baseball (MLB), 153

Mann-Whitney test, 155, 158

Margin of error, 287, 289, 291, 352

for a difference in two means, 437, 449, 454

for a difference in two proportions, 508, 519

for a single mean, 349, 353, 356357, 411, 426

for a single proportion, 486, 500

Marginal means, 705, 714

Matched pairs design, 182183, 185

inference for, 419420, 426, 472473, 1517

McNemar’s test, 554

Mean, 28, 46

of binomial distribution, 318, 332

of density curve, 55, 69

of difference of sample means, 434

of difference of sample proportions, 506

of Normal distribution, 56

of random variable, 246248, 261

rules for, 253254, 261

of sample mean, 296297, 307

of sample proportion, 320, 332, 500, 584

trimmed, 51

versus median, 31

Mean square

in one-way ANOVA, 660, 667

in two-way ANOVA, 709

in multiple linear regression, 613

in simple linear regression, 584586

Median, 30, 46

inference for, 472473, 159, 1523, 16-281629

of density curve, 55, 69

Mendel, Gregor, 230

Meta-analysis, 542

Minitab, 315, 395, 417, 422, 441, 463, 466, 493, 499, 509, 514, 517, 529, 548, 563, 595, 627, 630, 665, 684, 690, 713, 1411, 1414, 1416, 1417, 1420, 1421, 158, 1520, 1524

Minnesota Multiphasic Personality Inventory (MMPI), 5

Mode, 18, 23

Model selection, 629

Modified Levene’s test, 665666

Mosaic plot, 143, 531, 534

Motorola, 172

Multiple comparisons, 650, 677681, 1551

National AIDS Behavioral Surveys, 335

National Assessment of Educational Progress (NAEP), 7071, 381

National Association of Colleges and Employers (NACE), 354, 358, 464

National Center for Education Statistics, 119120, 166

National Collegiate Athletic Association (NCAA), 1611

National Endowment for the Humanities, 432

National Enquirer, 458

National Football League, 601

National Health and Nutrition Examination Survey (NHANES), 372, 434, 704

National Hockey League (NHL), 617618

National Longitudial Survey of Youth (NLSY), 574

National Oceanic and Atmospheric Administration (NOAA), 581

National Public Radio (NPR), 360

National Science Foundation (NSF), 597

Neyman, Jerzy, 399400

Nielsen Company, 294, 411, 428

Noncentrality parameter

for t, 468, 474

for F, 682

Nonparametric procedure, 470, 472473. See also Chapter 15

Nonresponse, 196, 200

Normal distribution. See Distribution, Normal

Normal distribution calculations, 6166

Normal probability plot. See Normal quantile plot

Normal quantile plot, 6667, 70

Normal scores, 66

Null hypothesis. See Hypothesis, null

Observational study, 168

Odds, 633, 142, 1418

Odds ratio, 633, 147, 1410, 1418

Outcomes, 171, 185

Out-of-control rules, 17-231726

Outliers, 19, 23, 151

1.5 × IQR criterion, 3536

regression, 127129, 133, 574575

Parameter, 282, 290

Pareto chart, 1718, 17-531754, 1757

Pearson, Egon, 399

Pearson, Karl, 218

Percent, 9

Percentile, 32

Permutation tests, 152, 16-411650

Pew Research Center survey, 198, 308, 428, 484, 485, 501, 521, 522, 527, 1419, 1515, 1655

Pie chart, 11, 23

Placebo effect, 174

Plug-in principle, 169, 1610

Pooled estimator

of population proportion, 512, 519

of ANOVA variance, 654, 660, 703

of variance in two samples, 448

Population, 189, 200

Population distribution. See Distribution, population

Power, 392, 400

and Type II error, 399

increasing, 395

of one-way ANOVA, 681685

of t test

one-sample, 465467

two-sample, 467469

of z test, 391395

of z test for a single proportion, 498499

of z test for comparing two proportions, 516517

Prediction, 107, 110, 118

Prediction interval, 572573, 578, 613

Probability, 216217, 219

conditional, 267268, 269

equally likely outcomes, 227

finite sample space, 225226

Probability distribution, 236, 241

mean of, 246248, 261

standard deviation of, 255256, 261

variance of, 255256, 261

I-4

Probability histogram, 237, 243

Probability model, 221, 232

Probability rules, 223224, 232

addition, 224, 232, 264, 266, 275

complement, 224, 232, 264, 275

general, 264275

multiplication, 228229, 232, 264265, 268, 275

Probability sample. See Sample, probability

Process capability indices, 17-401747

Proportion, 9

distribution of, 319321, 322323

inference for a single proportion, 483501

inference for comparing two proportions, 505517

population, 283

sample, 283, 319, 484, 500

P-value, 366, 379

Quartiles, 3233, 46

of a density curve, 55, 69

R, 315, 329, 330, 331, 332, 169, 1611, 1614, 1618, 1634, 1638, 1645

Randomization

consequences of, 177

experimental, 175176, 185

how to, 177180

Random digits, 180181, 192193, 200, 284, Table B

Random number generator, 375

Random phenomenon, 217, 219

Random variable, 235236 ,243

continuous, 239242, 243

discrete, 236, 243

mean of, 248, 261

standard deviation of, 256, 261

variance of, 256, 261

Randomized comparative experiment, 177, 185

Randomized response survey, 279280

Ranks, 154, 1514

Rate, 6

Regression, 107117

and correlation, 115, 118

cautions about, 123133

deviations, 88, 560, 577, 610

interpretation, 113

least-squares, 111, 611612

logistic, 632633, Chapter 14

model conditions, 567

model selection, 627631

multiple, 608615

multiple logistic, 632633, 14-161418

nonlinear, 576577

simple linear, 556576

Regression equation, population, 608, 615

Regression line, 107, 117

population, 557, 577

Relative risk, 518, 519

Reliability, 313

Resample, 424. See also Chapter 16

Residual, 123124, 133, 561, 577, 612, 616, 653

plots, 125, 133, 566, 577578, 599, 690

Resistant measure, 30, 46

Response bias, 198, 200

Response rate, 189

Response variable. See Variable, response

Robust, 30, 423424, 426, 442, 151

Roundoff error, 125, 138, 139

Row variable. See Variable, row, and Variable, column

Rugby sevens, 455

Sallie Mae, 350

Sample, 189, 200

cautions about, 196199, 200

design of, 189, 200

multistage,195196, 200

probability, 194, 200

simple random (SRS), 191193, 200

stratified, 193194, 200

systematic, 202

Sample size, choosing

confidence interval for a difference in means, 462463

confidence interval for a difference in proportions, 514515, 519

confidence interval for a mean, 353, 461463

confidence interval for a proportion, 494495, 500

one-way ANOVA, 681684

power for a proportion, 498499

power for a difference in proportions, 516517

t test, one-sample, 465467

t test, two-sample, 467469

Sample space, 221, 232

finite, 225

Sample survey, 167168, 170, 188200

Sampling distribution, 281, 284287, 290

of difference of means, 434

of regression estimators, 567

of sample count, 314, 322, 332

of sample mean, 298, 307

of sample proportion, 285, 319321, 322, 332

Sampling variability, 287288

SAS, 445, 587, 619, 626, 628, 631, 664, 710, 711

SAT college entrance examination, 73, 344, 604605, 618619, 1426

Scatterplot, 86, 96

adding categorical variables to, 93

smoothing, 94, 96

Shape of a distribution, 11, 23

Shewhart, Walter, 177, 1732

Sign test, 472473, 491492, 549550

Significance level, 367, 383385

Significance, statistical, 367369, 379

and Type I error, 398

Significance test, 361370

chi-square for two-way table, 534535, 543

relation to z test, 540541

chi-square test for goodness of fit, 547, 550

chi-square test for logistic regression slope, 1410, 1419

F test in one-way ANOVA, 660662

F test in regression, 585586, 596, 614

F test for a collection of regression coefficients, 631632, 635

F test for standard deviations, 665666

F tests in two-way ANOVA, 709

Kruskal-Wallis test, 15-261531

Mann-Whitney test, 155

relationship to confidence intervals, 375377

sign test for matched pairs, 472473, 491492

t test for a contrast, 674

t test for correlation, 594, 596

t test for one mean, 413, 425

t test for matched pairs, 419420

t test for two means, 440, 454

pooled, 449

t test for regression coefficients, 568, 578

t tests for multiple comparisons, 678

use and abuse, 384389

Wilcoxon rank sum test, 15-31514

Wilcoxon signed rank test, 15-171524

z test for one mean, 372, 379

z test for one proportion, 491, 500

z test for logistic regression slope, 1410, 1419

z test for two proportions, 511512, 519

Simple random sample. See Sample, simple random

Simpson’s paradox, 143, 145, 160

Simulation, 284

Simultaneous confidence intervals, 680

6895–99.7 rule, 5758, 70

Skewed distribution. See Distribution, skewed

Slope of a line, 108

of least-squares line, 112, 118, 561

Small numbers, law of, 252

Spread of a distribution, 32, 38, 46, 54

Spreadsheet, 3. See also Excel

SPSS, 417, 446, 530, 549, 562, 621, 657, 658, 671, 676, 680, 1414, 1417, 159, 1520

Standard & Poor’s 500-Stock Index, 416, 598

I-5

Standard deviation, 38, 46. See also Variance

of binomial distribution, 318, 332

of density curve, 55, 69

of deviations in ANOVA, 652, 702

of deviations in regression, 560, 577, 611, 615

of difference between sample means, 434

pooled, 448

of difference between sample proportions, 515, 519

of Normal distribution, 57

of Poisson distribution, 329, 333

of random variable, 256, 261

of regression intercept and slope, 590591

of sample mean, 297, 307

of sample proportion, 485

properties, 40

rules for, 257258, 261

Standard error, 408

bootstrap, 165, 168

for regression prediction, 592, 596

of a contrast, 674

of a difference in sample means, 436437

pooled, 448

of a difference in sample proportions, 508, 519

of a sample mean, 408, 425

of a sample proportion, 486

of mean regression response, 592, 596

of regression intercept and slope, 591, 596

Standard Normal distribution. See Distribution, standard Normal

Standardized observation, 59, 70

Statistic, 282, 290

Statistical inference, 282, 290, 341343

for non-Normal populations, 470473. See also Chapter 15

for small samples, 444447

Statistical process control, Chapter 17

Statistical significance. See Significance, statistical

Stem-and-leaf plot. See Stemplot

Stemplot, 11, 23

back-to-back, 12

splitting stems, 13

trimming, 13

Strata, 195, 200. See also Sample, stratified

Strength of a relationship, 88, 96. See also Correlation

StubHub! 69, 16-111612, 1622

Student Monitor, 283, 291

Subjects, experimental, 171, 185

Subpopulation, 557, 608

Sums of squares

in one-way ANOVA, 659660

in two-way ANOVA, 708709

in multiple linear regression, 613614

in simple linear regression, 583584

Survey of Study Habits and Attitudes (SSHA), 382

Systematically larger, 159

Symmetic distribution. See Distribution, symmetric

t distribution. See Distribution, t

t inference procedures

for contrasts, 674

for correlation, 594, 586

for matched pairs, 419420

for multiple comparisons, 678

for one mean, 411, 413

for regression coefficients, 568, 578, 612613, 616

for regression mean response, 570, 578

for regression prediction, 572, 578

for two means, 437, 440

for two means, pooled, 449

robustness of, 423424, 442443

Tails of a distribution. See Distribution, tails

Test of significance. See Significance test

Test statistic, 364365

Testing hypotheses. See Significance test

The Times Higher Education Supplement, 637638

Three-way table, 145

Ties, 1510, 1522

Time plot, 21, 23

Titanic, 24, 52, 146, 161, 1612, 1622

Transformation

linear, 4445, 46, 254

logarithm, 91, 96, 470471, 574575

rank, 154

square root, 671672

Treatment, experimental, 171, 174, 185

Tree diagram, 271273, 275

Tuskegee study, 208

Twitter, 75, 244, 522

Two-sample problems, 433

Two-way table, 136, 145

data analysis for, 136145

inference for, 525543

models for, 543

relationships in, 81, 528

Type I and II errors, 396397

Uber, 428

Unbiased estimator, 287

Undercoverage, 196, 200

Unimodal distribution. See Distribution, unimodal

Union of events, 264265

Unit of measurement, 3, 43

Unit, experimental, 171, 185

U.S. Agency for International Development, 1526

U.S. Department of Education, 336

Value of a variable, 2, 7

Variability, 32, 287288

Variable, 2, 7

categorical, 3, 7, 10, 11

column, 137, 145

dependent, 83

explanatory, 82, 84

independent, 83

indicator 14-3

lurking, 129130, 133, 172

quantitative, 3, 7, 11

response, 82, 84

row, 137, 145

Variance, 38, 46

of a difference between two sample means, 434

pooled, 448

of a difference between two sample proportions, 507, 519

of a random variable, 255256, 261

a pooled estimator, 448, 454

rules for, 257258, 261

of a sample mean, 297

Variation

among groups, 658, 667

between groups, 647, 658, 667

common cause, 177

special cause, 177

within group, 647, 658, 667

Venn diagram, 224

Voluntary response, 190191

Wald statistic, 1410, 1419

Wall Street Journal, 458

Whiskers, 35

Wilcoxon rank sum test, 15-31516

Wilcoxon signed rank test, 15-171524

Wording questions, 198, 200

World Bank, 28

World Database of Happiness, 638

z-score, 59, 70

z statistic

for one proportion, 491, 500

for two proportions, 512, 519

one-sample for mean, 371

two-sample for means, 435

pooled, 449