17-1
The linear regression methods we studied in Chapters 10 and 11 are used to model the relationship between a quantitative response variable and one or more explanatory variables. In this chapter, we describe similar methods for use when the response variable has only two possible outcomes. For example,
In general, we call the two outcomes of the response variable “success” and “failure” and represent them by 1 (for a success) and 0 (for a failure). The mean is then the proportion of 1s, .
Reminder
binomial setting, p. 245
If our data are independent observations with the same , this is the binomial setting. What is new in this chapter is that the data now include at least one explanatory variable and the probability of a success depends on the value of . The explanatory variables can either be categorical or quantitative. For example, the probability a customer purchases the flash sale item could depend on the age and gender of the customer, as well as the type of clothing item on sale and the percent discount. The probability a candidate accepts a job offer from JP Morgan Chase & Co. could depend on the salary amount, the level of guaranteed bonuses, and whether or not the offer includes a non-compete clause.
Because it is now a probability that depends on explanatory variables, inference methods are needed to ensure that the probability . Logistic regression is a statistical method for describing these kinds of relationships.1