Contents

v

To Teachers: About This Book xi

To Students: What Is Statistics? xix

About the Authors xxii

Data Table Index xxiii

Beyond the Basics Index xxiv

PART I Looking at Data

CHAPTER 1 Looking at Data—Distributions 1

Introduction 1

1.1 Data 2

Key characteristics of a data set 4

Section 1.1 Summary 7

Section 1.1 Exercises 7

1.2 Displaying Distributions with Graphs 8

Categorical variables: Bar graphs and pie charts 9

Quantitative variables: Stemplots and histograms 11

Histograms 14

Data analysis in action: Don’t hang up on me 16

Examining distributions 18

Dealing with outliers 19

Time plots 21

Section 1.2 Summary 23

Section 1.2 Exercises 23

1.3 Describing Distributions with Numbers 27

Measuring center: The mean 28

Measuring center: The median 30

Mean versus median 31

Measuring spread: The quartiles 32

The five-number summary and boxplots 34

The 1.5 × IQR rule for suspected outliers 35

Measuring spread: The standard deviation 38

Properties of the standard deviation 40

Choosing measures of center and spread 40

Changing the unit of measurement 43

Section 1.3 Summary 46

Section 1.3 Exercises 47

1.4 Density Curves and Normal Distributions 51

Density curves 53

Measuring center and spread for density curves 54

Normal distributions 56

The 68–95–99.7 rule 57

Standardizing observations 59

Normal distribution calculations 61

Using the standard Normal table 63

Inverse Normal calculations 64

Normal quantile plots 66

Beyond the Basics: Density estimation 68

Section 1.4 Summary 69

Section 1.4 Exercises 70

Chapter 1 Exercises 74

CHAPTER 2 Looking at Data—Relationships 79

Introduction 79

2.1 Relationships 79

Examining relationships 81

Section 2.1 Summary 84

Section 2.1 Exercises 84

2.2 Scatterplots 85

Interpreting scatterplots 88

The log transformation 91

Adding categorical variables to scatterplots 93

Scatterplot smoothers 94

Categorical explanatory variables 96

Section 2.2 Summary 96

Section 2.2 Exercises 96

2.3 Correlation 100

The correlation r 101

Properties of correlation 102

Section 2.3 Summary 104

Section 2.3 Exercises 105

2.4 Least-Squares Regression 107

Fitting a line to data 108

Prediction 110

Least-squares regression 111

Interpreting the regression line 113

Facts about least-squares regression 114

Correlation and regression 115

Another view of r2 117

vi

Section 2.4 Summary 117

Section 2.4 Exercises 118

2.5 Cautions about Correlation and Regression 123

Residuals 123

Outliers and influential observations 127

Beware of the lurking variable 129

Beware of correlations based on averaged data 131

Beware of restricted ranges 132

Beyond the Basics: Data mining 132

Section 2.5 Summary 133

Section 2.5 Exercises 133

2.6 Data Analysis for Two-Way Tables 136

The two-way table 136

Joint distribution 138

Marginal distributions 139

Describing relations in two-way tables 140

Conditional distributions 140

Simpson’s paradox 143

Section 2.6 Summary 145

Section 2.6 Exercises 146

2.7 The Question of Causation 148

Explaining association 149

Establishing causation 150

Section 2.7 Summary 152

Section 2.7 Exercises 153

Chapter 2 Exercises 154

CHAPTER 3 Producing Data 163

Introduction 163

3.1 Sources of Data 164

Anecdotal data 164

Available data 165

Sample surveys and experiments 167

Section 3.1 Summary 170

Section 3.1 Exercises 170

3.2 Design of Experiments 171

Comparative experiments 173

Randomization 175

Randomized comparative experiments 177

How to randomize 177

Randomization using software 178

Randomization using random digits 179

Cautions about experimentation 181

Matched pairs designs 182

Block designs 183

Section 3.2 Summary 185

Section 3.2 Exercises 186

3.3 Sampling Design 188

Simple random samples 191

How to select a simple random sample 191

Stratified random samples 194

Multistage random samples 195

Cautions about sample surveys 196

Beyond the Basics: Capture-recapture sampling 199

Section 3.3 Summary 200

Section 3.3 Exercises 200

3.4 Ethics 203

Institutional review boards 204

Informed consent 205

Confidentiality 206

Clinical trials 207

Behavioral and social science experiments 209

Section 3.4 Summary 211

Section 3.4 Exercises 211

Chapter 3 Exercises 212

PART II Probability and Inference

CHAPTER 4 Probability: The Study of Randomness 215

Introduction 215

4.1 Randomness 215

The language of probability 217

Thinking about randomness 218

The uses of probability 219

Section 4.1 Summary 219

Section 4.1 Exercises 220

4.2 Probability Models 220

Sample spaces 221

Probability rules 223

Assigning probabilities: Finite number of outcomes 225

Assigning probabilities: Equally likely outcomes 227

Independence and the multiplication rule 228

Applying the probability rules 231

Section 4.2 Summary 232

Section 4.2 Exercises 232

4.3 Random Variables 235

Discrete random variables 236

Continuous random variables 239

Normal distributions as probability distributions 242

Section 4.3 Summary 243

Section 4.3 Exercises 244

vii

4.4 Means and Variances of Random Variables 246

The mean of a random variable 246

Statistical estimation and the law of large numbers 250

Thinking about the law of large numbers 251

Beyond the Basics: More laws of large numbers 253

Rules for means 253

The variance of a random variable 255

Rules for variances and standard deviations 257

Section 4.4 Summary 261

Section 4.4 Exercises 262

4.5 General Probability Rules 264

General addition rules 264

Conditional probability 267

General multiplication rules 270

Tree diagrams 271

Bayes’s rule 273

Independence again 274

Section 4.5 Summary 274

Section 4.5 Exercises 275

Chapter 4 Exercises 278

CHAPTER 5 Sampling Distributions 281

Introduction 281

5.1 Toward Statistical Inference 282

Sampling variability 283

Sampling distributions 284

Bias and variability 287

Sampling from large populations 289

Why randomize? 290

Section 5.1 Summary 290

Section 5.1 Exercises 291

5.2 The Sampling Distribution of a Sample Mean 293

The mean and standard deviation of 296

The central limit theorem 298

A few more facts 304

Beyond the Basics: Weibull distributions 305

Section 5.2 Summary 307

Section 5.2 Exercises 307

5.3 Sampling Distributions for Counts and Proportions 310

The binomial distributions for sample counts 312

Binomial distributions in statistical sampling 314

Finding binomial probabilities 315

Binomial mean and standard deviation 317

Sample proportions 319

Normal approximation for counts and proportions 321

The continuity correction 325

Binomial formula 326

The Poisson distributions 328

Section 5.3 Summary 332

Section 5.3 Exercises 333

Chapter 5 Exercises 338

CHAPTER 6 Introduction to Inference 341

Introduction 341

Overview of inference 342

6.1 Estimating with Confidence 343

Statistical confidence 344

Confidence intervals 346

Confidence interval for a population mean 348

How confidence intervals behave 352

Choosing the sample size 353

Some cautions 355

Section 6.1 Summary 356

Section 6.1 Exercises 357

6.2 Tests of Significance 361

The reasoning of significance tests 361

Stating hypotheses 363

Test statistics 364

P-values 365

Statistical significance 367

Tests for a population mean 371

Two-sided significance tests and confidence intervals 375

The P-value versus a statement of significance 377

Section 6.2 Summary 379

Section 6.2 Exercises 379

6.3 Use and Abuse of Tests 384

Choosing a level of significance 384

What statistical significance does not mean 385

Don’t ignore lack of significance 386

Statistical inference is not valid for all sets of data 387

Beware of searching for significance 388

Section 6.3 Summary 389

Section 6.3 Exercises 389

6.4 Power and Inference as a Decision 391

Power 391

Increasing the power 395

Inference as decision 396

Two types of error 396

Error probabilities 397

The common practice of testing hypotheses 399

Section 6.4 Summary 400

Section 6.4 Exercises 400

Chapter 6 Exercises 402

viii

CHAPTER 7 Inference for Means 407

Introduction 407

7.1 Inference for the Mean of a Population 408

The t distributions 408

The one-sample t confidence interval 410

The one-sample t test 412

Matched pairs t procedures 419

Robustness of the t procedures 423

Beyond the Basics: The bootstrap 424

Section 7.1 Summary 425

Section 7.1 Exercises 426

7.2 Comparing Two Means 432

The two-sample z statistic 434

The two-sample t procedures 436

The two-sample t confidence interval 436

The two-sample t significance test 439

Robustness of the two-sample procedures 442

Inference for small samples 444

Software approximation for the degrees of freedom 447

The pooled two-sample t procedures 448

Section 7.2 Summary 453

Section 7.2 Exercises 454

7.3 Additional Topics on Inference 460

Choosing the sample size 461

Inference for non-Normal populations 470

Section 7.3 Summary 474

Section 7.3 Exercises 474

Chapter 7 Exercises 476

CHAPTER 8 Inference for Proportions 483

Introduction 483

8.1 Inference for a Single Proportion 484

Large-sample confidence interval for a single proportion 485

Beyond the Basics: The plus four confidence interval for a single proportion 489

Significance test for a single proportion 491

Choosing a sample size for a confidence interval 494

Choosing a sample size for a significance test 498

Section 8.1 Summary 500

Section 8.1 Exercises 501

8.2 Comparing Two Proportions 505

Large-sample confidence interval for a difference in proportions 506

Beyond the Basics: The plus four confidence interval for a difference in proportions 509

Significance test for a difference in proportions 511

Choosing a sample size for two sample proportions 514

Beyond the Basics: Relative risk 518

Section 8.2 Summary 519

Section 8.2 Exercises 520

Chapter 8 Exercises 522

PART III Topics in Inference

CHAPTER 9 Inference for Categorical Data 525

Introduction 525

9.1 Inference for Two-Way Tables 526

The hypothesis: No association 532

Expected cell counts 533

The chi-square test 534

Computations 536

Computing conditional distributions 537

The chi-square test and the z test 540

Beyond the Basics: Meta-analysis 542

Section 9.1 Summary 543

Section 9.1 Exercises 544

9.2 Goodness of Fit 545

Section 9.2 Summary 550

Section 9.2 Exercises 550

Chapter 9 Exercises 551

CHAPTER 10 Inference for Regression 555

Introduction 555

10.1 Simple Linear Regression 556

Statistical model for linear regression 556

Preliminary data analysis and inference considerations 558

Estimating the regression parameters 561

Checking model assumptions 565

Confidence intervals and significance tests 567

Confidence intervals for mean response 570

Prediction intervals 572

Transforming variables 574

Beyond the Basics: Nonlinear regression 576

Section 10.1 Summary 577

Section 10.1 Exercises 578

10.2 More Detail about Simple Linear Regression 582

Analysis of variance for regression 583

The ANOVA F test 585

Calculations for regression inference 588

Inference for correlation 593

ix

Section 10.2 Summary 596

Section 10.2 Exercises 597

Chapter 10 Exercises 598

CHAPTER 11 Multiple Regression 607

Introduction 607

11.1 Inference for Multiple Regression 608

Population multiple regression equation 608

Data for multiple regression 609

Multiple linear regression model 610

Estimation of the multiple regression parameters 611

Confidence intervals and significance tests for regression coefficients 612

ANOVA table for multiple regression 613

Squared multiple correlation R2 615

Section 11.1 Summary 615

Section 11.1 Exercises 616

11.2 A Case Study 618

Preliminary analysis 619

Relationships between pairs of variables 620

Regression on high school grades 622

Interpretation of results 624

Examining the residuals 624

Refining the model 625

Regression on SAT scores 626

Regression using all variables 627

Test for a collection of regression coefficients 631

Beyond the Basics: Multiple logistic regression 632

Section 11.2 Summary 633

Section 11.2 Exercises 634

Chapter 11 Exercises 636

CHAPTER 12 One-Way Analysis of Variance 643

Introduction 643

12.1 Inference for One-Way Analysis of Variance 644

Data for one-way ANOVA 644

Comparing means 645

The two-sample t statistic 647

An overview of ANOVA 647

The ANOVA model 651

Estimates of population parameters 653

Testing hypotheses in one-way ANOVA 656

The ANOVA table 658

The F test 660

Software 663

Beyond the Basics: Testing the equality of spread 665

Section 12.1 Summary 666

Section 12.1 Exercises 667

12.2 Comparing the Means 670

Contrasts 670

Multiple comparisons 677

Power 681

Section 12.2 Summary 685

Section 12.2 Exercises 685

Chapter 12 Exercises 687

CHAPTER 13 Two-Way Analysis of Variance 697

Introduction 697

13.1 The Two-Way ANOVA Model 698

Advantages of two-way ANOVA 698

The two-way ANOVA model 702

Main effects and interactions 703

13.2 Inference for Two-Way ANOVA 708

The ANOVA table for two-way ANOVA 708

Chapter 13 Summary 713

Chapter 13 Exercises 714

Companion Chapters

(on the IPS website www.macmillanhighered.com/ips9e and in LaunchPad)

CHAPTER 14 Logistic Regression 14-1

Introduction 14-1

14.1 The Logistic Regression Model 14-2

Binomial distributions and odds 14-2

Odds for two groups 14-3

Model for logistic regression 14-5

Fitting and interpreting the logistic regression model 14-6

14.2 Inference for Logistic Regression 14-9

Confidence intervals and significance tests 14-9

Multiple logistic regression 14-16

Chapter 14 Summary 14-18

Chapter 14 Exercises 14-19

Chapter 14 Notes and Data Sources 14-26

CHAPTER 15 Nonparametric Tests 15-1

Introduction 15-1

15.1 The Wilcoxon Rank Sum Test 15-3

The rank transformation 15-4

The Wilcoxon rank sum test 15-5

The Normal approximation 15-7

What hypotheses does Wilcoxon test? 15-9

Ties 15-10

Rank, t, and permutation tests 15-13

x

Section 15.1 Summary 15-14

Section 15.1 Exercises 15-15

15.2 The Wilcoxon Signed Rank Test 15-17

The Normal approximation 15-21

Ties 15-22

Testing a hypothesis about the median of a distribution 15-23

Section 15.2 Summary 15-24

Section 15.2 Exercises 15-24

15.3 The Kruskal-Wallis Test 15-26

Hypotheses and assumptions 15-27

The Kruskal-Wallis test 15-28

Section 15.3 Summary 15-30

Section 15.3 Exercises 15-31

Chapter 15 Exercises 15-33

Chapter 15 Notes and Data Sources 15-34

CHAPTER 16 Bootstrap Methods and Permutation Tests 16-1

Introduction 16-1

Software 16-2

16.1 The Bootstrap Idea 16-3

The big idea: Resampling and the bootstrap distribution 16-3

Thinking about the bootstrap idea 16-8

Using software 16-9

Section 16.1 Summary 16-10

Section 16.1 Exercises 16-10

16.2 First Steps in Using the Bootstrap 16-12

Bootstrap t confidence intervals 16-13

Bootstrapping to compare two groups 16-16

Beyond the Basics: The bootstrap for a scatterplot smoother 16-19

Section 16.2 Summary 16-21

Section 16.2 Exercises 16-22

16.3 How Accurate Is a Bootstrap Distribution? 16-24

Bootstrapping small samples 16-26

Bootstrapping a sample median 16-28

Section 16.3 Summary 16-30

Section 16.3 Exercises 16-30

16.4 Bootstrap Confidence Intervals 16-31

Bootstrap percentile confidence intervals 16-31

A more accurate bootstrap confidence interval: BCa 16-32

Confidence intervals for the correlation 16-35

Section 16.4 Summary 16-37

Section 16.4 Exercises 16-37

16.5 Significance Testing Using Permutation Tests 16-41

Using software 16-44

Permutation tests in practice 16-45

Permutation tests in other settings 16-47

Section 16.5 Summary 16-50

Section 16.5 Exercises 16-51

Chapter 16 Exercises 16-54

Chapter 16 Notes and Data Sources 16-56

CHAPTER 17 Statistics for Quality: Control andCapability 17-1

Introduction 17-1

Use of data to assess quality 17-2

17.1 Processes and Statistical Process Control 17-3

Describing processes 17-3

Statistical process control 17-6

charts for process monitoring 17-8

s charts for process monitoring 17-12

Section 17.1 Summary 17-17

Section 17.1 Exercises 17-18

17.2 Using Control Charts 17-22

and R charts 17-23

Additional out-of-control rules 17-24

Setting up control charts 17-26

Comments on statistical control 17-31

Don’t confuse control with capability! 17-34

Section 17.2 Summary 17-35

Section 17.2 Exercises 17-36

17.3 Process Capability Indexes 17-40

The capability indexes Cp and Cpk 17-43

Cautions about capability indexes 17-46

Section 17.3 Summary 17-48

Section 17.3 Exercises 17-48

17.4 Control Charts for Sample Proportions 17-51

Control limits for p charts 17-52

Section 17.4 Summary 17-56

Section 17.4 Exercises 17-56

Chapter 17 Exercises 17-57

Chapter 17 Notes and Data Sources 17-59

Tables T-1

Notes and Data Sources N-1

Index I-1