11.84 Alternate movie revenue model.
CASE 11.2 Refer to the data set on movie revenue in Case 11.2 (page 550). The variables Budget, Opening, and USRevenue all have distributions with long tails. For this problem, let’s consider building a model using the logarithm transformation of these variables.
movies
11.85 Education and income.
CASE 10.1 Recall Case 10.1 (pages 485–486), which looked at the relationship between an entrepreneur’s log income and level of education. In addition to the level of education, the entrepreneur’s age and a measure of his or her perceived control of the environment (locus of control) was also obtained. The larger the locus of control, the more in control one feels.
entre1
11.85
(a) .
(b) The parameters are: , and .
(c) .
(d) The Normal quantile plot shows the residuals are Normally distributed.
(e) All three residual plots look good (random). Both linearity and constant variance are valid.
11.86 Education and income, continued.
CASE 10.1 Refer to the previous exercise. Provided the data meet the requirements of the multiple regression model, we can now perform inference.
entre1
11.87 Compare regression coefficients.
Again refer to Exercise 11.85.
entre1
11.87
(a) For the model with EDUC: . For the model with all three: . With just Education, the intercept was larger and the effect size of each year of education was larger, 0.1126, on LogIncome. Once we account for both Locus of control and Age, the intercept isn’t quite as large, and the effect size of each year of education goes down to 0.08542.
(b) For the full model: . For the EDUC model: . The predictions don’t seem too different unless we undo the log transformation; the predicted incomes are $17,940.21 and $14,849.18, which seems like a substantial difference.
(c) are 2 and . Locus of control and Age are helpful predictors in explaining LogIncome when Education is already in the model.
11.88 Business-to-business (B2B) marketing.
A group of researchers were interested in determining the likelihood that a business currently purchasing office supplies via a catalog would switch to purchasing from the website of the same supplier. To do this, they performed an online survey using the business clients of a large Australian-based stationery provider with both a catalog and a Web-based business.19 Results from 1809 firms, all currently purchasing via the catalog, were obtained. The following table summarizes the regression model.
585
Variable | ||
---|---|---|
Staff interpersonal contact with catalog | −0.08 | 3.34 |
Trust of supplier | 0.11 | 4.66 |
Web benefits (access and accuracy) | 0.08 | 3.92 |
Previous Web purchases | 0.18 | 8.20 |
Previous Web information search | 0.08 | 3.47 |
Key catalog benefits (staff, speed, security) | −0.08 | 3.96 |
Web benefits (speed and ease of use) | 0.36 | 3.97 |
Problems with Web ordering and delivery | −0.06 | 2.65 |
This statistic can be expressed in terms of as
Use this relationship to determine .
Exercises 11.89 through 11.92 use the PPROMO data set shown in Table 11.7.
11.89 Discount promotions at a supermarket.
How does the frequency that a supermarket product is promoted at a discount affect the price that customers expect to pay for the product? Does the percent reduction also affect this expectation? These questions were examined by researchers in a study that used 160 subjects. The treatment conditions corresponded to the number of promotions (one, three, five, or seven) that were described during a 10-week period and the percent that the product was discounted (10%, 20%, 30%, and 40%). Ten students were randomly assigned to each of the treatments.20
ppromo
Number of promotions |
Percent discount |
Expected price ($) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 40 | 4.10 | 4.50 | 4.47 | 4.42 | 4.56 | 4.69 | 4.42 | 4.17 | 4.31 | 4.59 |
1 | 30 | 3.57 | 3.77 | 3.90 | 4.49 | 4.00 | 4.66 | 4.48 | 4.64 | 4.31 | 4.43 |
1 | 20 | 4.94 | 4.59 | 4.58 | 4.48 | 4.55 | 4.53 | 4.59 | 4.66 | 4.73 | 5.24 |
1 | 10 | 5.19 | 4.88 | 4.78 | 4.89 | 4.69 | 4.96 | 5.00 | 4.93 | 5.10 | 4.78 |
3 | 40 | 4.07 | 4.13 | 4.25 | 4.23 | 4.57 | 4.33 | 4.17 | 4.47 | 4.60 | 4.02 |
3 | 30 | 4.20 | 3.94 | 4.20 | 3.88 | 4.35 | 3.99 | 4.01 | 4.22 | 3.70 | 4.48 |
3 | 20 | 4.88 | 4.80 | 4.46 | 4.73 | 3.96 | 4.42 | 4.30 | 4.68 | 4.45 | 4.56 |
3 | 10 | 4.90 | 5.15 | 4.68 | 4.98 | 4.66 | 4.46 | 4.70 | 4.37 | 4.69 | 4.97 |
5 | 40 | 3.89 | 4.18 | 3.82 | 4.09 | 3.94 | 4.41 | 4.14 | 4.15 | 4.06 | 3.90 |
5 | 30 | 3.90 | 3.77 | 3.86 | 4.10 | 4.10 | 3.81 | 3.97 | 3.67 | 4.05 | 3.67 |
5 | 20 | 4.11 | 4.35 | 4.17 | 4.11 | 4.02 | 4.41 | 4.48 | 3.76 | 4.66 | 4.44 |
5 | 10 | 4.31 | 4.36 | 4.75 | 4.62 | 3.74 | 4.34 | 4.52 | 4.37 | 4.40 | 4.52 |
7 | 40 | 3.56 | 3.91 | 4.05 | 3.91 | 4.11 | 3.61 | 3.72 | 3.69 | 3.79 | 3.45 |
7 | 30 | 3.45 | 4.06 | 3.35 | 3.67 | 3.74 | 3.80 | 3.90 | 4.08 | 3.52 | 4.03 |
7 | 20 | 3.89 | 4.45 | 3.80 | 4.15 | 4.41 | 3.75 | 3.98 | 4.07 | 4.21 | 4.23 |
7 | 10 | 4.04 | 4.22 | 4.39 | 3.89 | 4.26 | 4.41 | 4.39 | 4.52 | 3.87 | 4.70 |
586
11.89
(a) As the number of promotions increases, the expected price goes down. For discount, the expected price for 10% and 20% seem similar, as does the expected price for 30% and 40%, which is lower than the expected price for 10% and 20%.
(b) and (c) The drop of expected price is fairly consistent with an increase in promotions. Similarly, the drop in price is fairly consistent with increase percent discount; however, the 40% discount consistently yields higher expected prices than when the 30% discount is used.
Promotions | Discount | Mean | Std Dev |
1 | 10 | 4.92 | 0.1520234 |
20 | 4.689 | 0.2330689 | |
30 | 4.225 | 0.3856092 | |
40 | 4.423 | 0.1847551 | |
3 | 10 | 4.756 | 0.2429083 |
20 | 4.524 | 0.2707274 | |
30 | 4.097 | 0.2346179 | |
40 | 4.284 | 0.2040261 | |
5 | 10 | 4.393 | 0.2685372 |
20 | 4.251 | 0.2648459 | |
30 | 3.89 | 0.1628906 | |
40 | 4.058 | 0.1759924 | |
7 | 10 | 4.269 | 0.2699156 |
20 | 4.094 | 0.2407488 | |
30 | 3.76 | 0.2617887 | |
40 | 3.78 | 0.2143725 |
11.90 Run the multiple regression.
Refer to the previous exercise. Run a multiple regression using promotions and discount to predict expected price. Write a summary of your results.
ppromo
11.91 Residuals and other models.
Refer to the previous exercise. Analyze the residuals from your analysis, and investigate the possibility of using quadratic and interaction terms as predictors. Write a report recommending a final model for this problem with a justification for your recommendation.
ppromo
11.91
The Normal quantile plot shows a slight left-skew in the residuals. The residual plot for promotions looks good (random). The residual plot for Discount shows a slight curve and suggests a possible quadratic model. Investigating a quadratic term for discount and possible interaction terms shows that none of the interaction terms test significant. After removing these, the quadratic term for discount is significant . The equation becomes: . This model has an , which is somewhat better than the 56.62% for the model without the quadratic term. It is possible to leave out the quadratic term to simplify interpretation; otherwise, the model with this term seems to be best in terms of prediction.
11.92 Can we generalize the results?
The subjects in this experiment were college students at a large Midwest university who were enrolled in an introductory management course. They received the information about the promotions during a 10-week period during their course. Do you think that these facts about the data would influence how you would interpret and generalize the results? Write a summary of your ideas regarding this issue.
ppromo
11.93 Determinants of innovation capability.
A study of 367 Australian small/medium enterprise (SME) firms looked at the relationship between perceived innovation marketing capability and two marketing support capabilities, market orientation and management capability. All three variables were measured on the same scale such that a higher score implies a more positive perception.21 Given the relatively large sample size, the researchers grouped the firms into three size categories (micro, small, and medium) and analyzed each separately. The following table summarizes the results.
Micro |
Small |
Medium |
||||
---|---|---|---|---|---|---|
Explanatory variable |
||||||
Market orientation | 0.69 | 0.08 | 0.47 | 0.06 | 0.37 | 0.12 |
Management capability | 0.14 | 0.08 | 0.39 | 0.06 | 0.38 | 0.12 |
F statistic | 87.6 | 117.7 | 37.2 |
11.93
(a) For Micro: are 2 and . For Small: are 2 and . For Medium: are 2 and . The two explanatory variables are helpful in predicting the perceived level of innovation capability for each firm size.
(b) For Micro: , and Market Orientation is significant. , and Management Capability is not significant. For Small: , and Market Orientation is significant. , and Management Capability is significant. For Medium: , and Market Orientation is significant. , and Management Capability is significant.
(c) For all three sizes, the overall model was very significant. However, for the Micro size, the Management Capability was not needed and was not significant given Market Orientation is in the model. For the other two sizes, Small and Medium, both variables tested significant at the 5% level, and both were useful in predicting perceived level of innovation capability.
11.94 Are separate analyses needed?
Refer to the previous exercise. Suppose you wanted to generate a similar table but have it based on results from only one multiple regression rather than on three.
11.95 Impact of word of mouth.
Word of mouth (WOM) is informal advice passed among consumers that may have a quick and powerful influence on consumer behavior. Word of mouth may be positive (PWOM), encouraging choice of a certain brand, or negative (NWOM), discouraging that choice. A study investigated the impact of WOM on brand purchase probability.22 Multiple regression was used to assess the effect of six variables on brand choice. These were pre-WOM probability of purchase (PPP), strength of expression of WOM, WOM about main brand, closeness of the communicator, whether advice was sought, and amount of WOM given. The following table summarizes the results for 903 participants who received NWOM.
587
Variable | ||
---|---|---|
PPP | −0.37 | 0.022 |
Strength of expression of WOM | −0.22 | 0.065 |
WOM about main brand | 0.21 | 0.164 |
Closeness of communicator | −0.06 | 0.121 |
Whether advice was sought | −0.04 | 0.140 |
Amount of WOM given | −0.08 | 0.022 |
In addition, it is reported that .
11.95
(a) 20%.
(b) To be significant, , Strength of expression of , and Amount of WOM given are significant.
(c) The stronger the expression of WOM, the more negative the impact of NWOM. (d) The regression coefficient for WOM about main brand is 0.21, meaning there is a 0.21 difference in NWOM between when the receiver is given NWOM about the receiver’s main brand versus when they are given NWOM about another brand, or the NWOM effect is much larger when it is the receiver’s main brand.
11.96 Correlations may not be a good way to screen for multiple regression predictors.
We use a constructed data set in this problem to illustrate this point.
dseta
11.97 The multiple regression results do not tell the whole story.
We use a constructed data set in this problem to illustrate this point.
dsetb
11.97
(a) The multiple regression equation is: . Likewise, neither predictor tests significant when added last: . The data do not show a significant multiple linear regression between and the predictors and .
(b) For and : . For and . Both and are significant in predicting in a simple linear regression.
(c) An insignificant multiple regression test doesn’t necessarily imply that all predictors are not useful; we should explore other strategies and/or tests to verify that none of the predictors are useful in different models/settings. In this case, and are highly correlated and likely their tests will be insignificant when they are used in the same model together.
Exercises 11.98 through 11.104 use the CROPS data file, which contains the U.S. yield (bushels/acre) of corn and soybeans from 1957-2013.23
11.98 Corn yield varies over time.
Run the simple linear regression using year to predict corn yield.
crops
11.99 Can soybean yield predict corn yield?
Run the simple linear regression using soybean yield to predict corn yield.
crops
11.99
.
(a) . There is a significant simple linear regression between corn yield and soybean yield; soybean yield can significantly predict corn yield. .
(b) The Normal quantile plot shows that the residuals are mostly Normal but have a slight right-skew.
(c) There is somewhat of a relationship between the residuals and year, suggesting that it might be useful in the model with soybean yield to predict corn yield.
11.100 Use both predictors.
From the previous two exercises, we conclude that year and soybean yield may be useful together in a model for predicting corn yield. Run this multiple regression.
crops
588
11.101 Try a quadratic.
We need a new variable to model the curved relation that we see between corn yield and year in the residual plot of the last exercise. Let . (When adding a squared term to a multiple regression model, we sometimes subtract the mean of the variable being squared before squaring. This eliminates the correlation between the linear and quadratic terms in the model and thereby reduces collinearity.)
crops
11.101
(a) .
(b) At least one are 3 and . There is a significant multiple linear regression between corn yield and the predictors’ Year, Year2, and SoyBeanYield. Together, the predictors can significantly predict corn yield.
(c) , up from 93.82%.
(d) For Year: . Year is significant in predicting corn yield in a model already containing Year2 and SoyBeanYield. For Year2: . Year2 is significant in predicting corn yield in a model already containing Year and SoyBeanYield. For SoyBeanYield: . Soy- BeanYield is significant in predicting corn yield in a model already containing Year and Year2.
(e) The Normal quantile plot shows a roughly Normal distribution; there is one observation with a fairly high residual. The residual plots all look good (random); the residual plot for Year is much better and doesn’t have the rising and falling that the previous plot had. Overall, the model fit is much better using the quadratic term for Year than without.
11.102 Compare models.
Run the model to predict corn yield using year and the squared term year2 defined in the previous exercise.
crops
11.103 Do a prediction.
Use the simple linear regression model with corn yield as the response variable and year as the explanatory variable to predict the corn yield for the year 2014, and give the 95% prediction interval. Also, use the multiple regression model where year and year2 are both explanatory variables to find another predicted value with the 95% interval. Explain why these two predicted values are so different. The actual yield for 2014 was 167.4 bushels per acre. How well did your models predict this value?
crops
11.103
For Year alone, , and the prediction interval is (138.8, 181.6901). For Year and Year2: , and the prediction interval is (134.3030, 179.0220). The two predicted values are different because we are near the edge of the data for Year, and as we saw in the previous exercise, this will cause the greatest differences using the quadratic term. The actual yield of 167.4 is not predicted very well by either model but is closer to the predicted value for the linear model than for the quadratic model.
11.104 Predict the yield for another year.
Repeat the previous exercise doing the prediction for 2020. Compare the results of this exercise with the previous one. Also explain why the predicted values are beginning to differ more substantially.
crops
11.105 Predicting U.S. movie revenue.
Refer to Case 11.2 (page 550). The data set MOVIES contains several other explanatory variables that are available at the time of release that we did not consider in the examples and exercises. These include
Using these explanatory variables and Opening, Budget, and Theaters, determine the best model for predicting U.S. revenue.
movies
11.106 Price-fixing litigation.
Multiple regression is sometimes used in litigation. In the case of Cargill, Inc. v. Hardin, the prosecution charged that the cash price of wheat was manipulated in violation of the Commodity Exchange Act. In a statistical study conducted for this case, a multiple regression model was constructed to predict the price of wheat using three supply-and-demand explanatory variables.24 Data for 14 years were used to construct the regression equation, and a prediction for the suspect period was computed from this equation. The value of was 0.989.
589
11.107 Predicting CO2 emissions.
The data set CO2MPG contains an SRS of 200 passenger vehicles sold in Canada in 2014. There appears to be a quadratic relationship between CO2 emissions and mile per gallon highway(MPGHWY).
co2mpg
11.107
(a)
Regression Coefficients | |||
Type | Intercept | mpg | mpg2 |
D | 267.3823 | −5.42585 | 0.04619 |
E | 160.84557 | −3.89582 | 0.30631 |
X | 235.16637 | −7.18033 | 0.12751 |
Z | 243.75987 | −7.88188 | 0.13832 |
Type X and Z are very similar and show very few differences in all of the coefficients. Types D and E are very different. Type E has a much smaller slope for MPG than all the other types, and the MPG2 effect is quite large—more than double all the rest. Type D also has a slightly smaller slope for MPG than X and, but it has an extremely small slope for MPG2.
(b)
Parameter | Estimate |
Intercept | 243.75987 |
X1 | 23.62243 |
X2 | −82.91430 |
X3 | −8.59350 |
mpg | −7.88188 |
MPGX1 | 2.45603 |
MPGX2 | 3.98607 |
MPGX3 | 0.70155 |
mpg2 | 0.13832 |
MPG2X1 | −0.09214 |
MPG2X2 | 0.16798 |
MPG2X3 | −0.01081 |
S-34
Answers will vary depending on how the indicator variables were created. Setting Z has the default type ; the parameter estimates are in the table shown. So the estimates for the Intercept, MPG, and MPG2 will match type Z’s estimates exactly. To recoup the others, we just set and for Type D, etc., yielding an intercept of , a slope for MPG of , and a slope for MPG2 of , etc. This yields the same equations as part (a).
11.108 Prices of homes.
Consider the data set used for Case 11.3 (page 566). This data set includes information for several other zip codes. Pick a different zip code and analyze the data. Compare your results with what we found for zip code 47904 in Section 11.3.
homes
590