EXAMPLE 16 Strategy for building a multiple regression model
baseball2013
The author of this book first became interested in the field of statistics through the enjoyment of sports statistics, especially baseball, which is packed with interesting statistics. Today, professional sports teams are seeking competitive advantage through the analysis of data and statistics, such as Sabermetrics (Society of American Baseball Research, www.sabr.org), as shown in the motion picture Moneyball.
Suppose a baseball researcher is interested in predicting , using the data set Baseball 2013 and the following predictor variables:
Use the Strategy for Building a Multiple Regression Model to build the best multiple regression model for predicting the number of runs scored using these predictor variables, at level of significance .
Solution
The data set Baseball 2013 contains the batting statistics of the players in Major League Baseball who had at least 100 at-bats during the 2013 season (Source: www.seanlahman.com/baseball-archive/statistics).
Step 2 The test (the first time). In Figure 30, the -value for Batting Average is greater than level of significance . We therefore eliminate the Batting Average from the model. Perhaps surprisingly, a player's batting average is evidently not helpful in predicting the number of runs that player will score when all other predictors are held constant.
752
The multiple regression equation for the final model is shown here.
753
NOW YOU CAN DO
Exercises 24β26 and 31β33.