EXAMPLE 11.2

image
© Radius Images / Alamy

Tipping behavior in Canada. The Consumer Report on Eating Share Trends (CREST) contains data spanning all provinces of Canada and details away-from-home food purchases by roughly 4000 households per quarter. Some researchers accessed these data but restricted their attention to restaurants at which tips would normally be given.4 From a total of 73,822 observations, “high’’ and “low’’ tipping variables were created based on whether the observed tip rate was above 20% or below 10%, respectively. They then used logistic regression to identify explanatory variables associated with either “high’’ or “low’’ tips.

The model consisted of more than 25 explanatory variables, grouped as “control’’ variables and “stereotype-related’’ variables. The stereotype-related explanatory variables were x1, a variable having the value 1 if the age of the diner was greater than 65 years, and 0 otherwise; x2, coded as 1 if the meal was on Sunday, and 0 otherwise; x3, coded as 1 to indicate English was a second language; x4, a variable coded 1 if the diner was a French-speaking Canadian; x5, a variable coded 1 if alcoholic drinks were served with the meal; and x6, a variable coded 1 if the meal involved a lone male.

633

chi-square distribution, p. 535

Similar to the F test in multiple regression, there is a chi-square test for multiple logistic regression that tests the null hypothesis that all coefficients of the explanatory variables are zero. These results were not presented in the article because the focus was more on comparing the high- and low-tip models. In place of the t tests for individual coefficients in multiple regression, chi-square tests, each with 1 degree of freedom, are used to test whether individual coefficients are zero. The article does report these tests. A majority of the variables considered in the models have P-values less than 0.01.

Interpretation of the coefficients is a little more difficult in multiple logistic regression because of the form of the model. For example, the high-tip model (using only the stereotype-related variables) is

= β0 + β1x1 + β2x2 + … + β6x6

The expression p/(1 − p) is the oddsodds that the tip was above 20%. Logistic regression models the “log odds’’ as a linear combination of the explanatory variables. Positive coefficients are associated with a higher probability that the tip is high. These coefficients are often transformed back () to the odds scale, giving us an odds ratioodds ratio. An odds ratio greater than 1 is associated with a higher probability that the tip is high. Here is the table of odds ratios reported in the article for the high-tip model:

Explanatory variableOdds ratio
Senior adult0.7420*
Sunday0.9970
English as second language0.7360*
French-speaking Canadian0.7840*
Alcoholic drinks1.1250*
Lone male1.0220*

The starred values were significant at the 0.01 level. We see that the probability of a high tip is reduced (odds ratio less than 1) when the diner is over 65 years old, speaks English as a second language, and is a French-speaking Canadian. The probability of a high tip is increased (odds ratio greater than 1) if alcohol is served with the meal.