EXAMPLE 16 Strategy for building a multiple regression model
baseball2013
The author of this book first became interested in the field of statistics through the enjoyment of sports statistics, especially baseball, which is packed with interesting statistics. Today, professional sports teams are seeking competitive advantage through the analysis of data and statistics, such as Sabermetrics (Society of American Baseball Research, www.sabr.org), as shown in the motion picture Moneyball.
Suppose a baseball researcher is interested in predicting y=runs scored, using the data set Baseball 2013 and the following predictor variables:
Use the Strategy for Building a Multiple Regression Model to build the best multiple regression model for predicting the number of runs scored using these predictor variables, at level of significance α=0.05.
Solution
The data set Baseball 2013 contains the batting statistics of the n=448 players in Major League Baseball who had at least 100 at-bats during the 2013 season (Source: www.seanlahman.com/baseball-archive/statistics).
Step 2 The t test (the first time). In Figure 30, the p-value for Batting Average is greater than level of significance α=0.05. We therefore eliminate the Batting Average from the model. Perhaps surprisingly, a player's batting average is evidently not helpful in predicting the number of runs that player will score when all other predictors are held constant.
The multiple regression equation for the final model is shown here.
ˆy=−1.283+0.3182 Hits+0.2105 Doubles+1.429 Triples+0.6833 Home Runs−0.1059 RBIs +0.2319 Walks +5.08 Red Sox
NOW YOU CAN DO
Exercises 24–26 and 31–33.