17 Logistic Regression

Printed Page 17-1

Logistic Regression

CHAPTER OUTLINE

Introduction

The linear regression methods we studied in Chapters 10 and 11 are used to model the relationship between a quantitative response variable and one or more explanatory variables. In this chapter, we describe similar methods for use when the response variable has only two possible outcomes. For example,

HauteLook.com is an online destination offering limited-time flash sale events. A response variable of interest to their sales division is whether a member buys or does not buy the daily flash sale item.
For JP Morgan Chase & Co. recruiting leadership, a response variable of interest is whether a candidate accepts or declines a job offer.

In general, we call the two outcomes of the response variable “success” and “failure” and represent them by 1 (for a success) and 0 (for a failure). The mean is then the proportion of 1s, $p = P (success)$ .

Reminder

binomial setting, p. 245

If our data are $n$ independent observations with the same $p$ , this is the binomial setting. What is new in this chapter is that the data now include at least one explanatory variable $x$ and the probability $p$ of a success depends on the value of $x$ . The explanatory variables can either be categorical or quantitative. For example, the probability a customer purchases the flash sale item could depend on the age and gender of the customer, as well as the type of clothing item on sale and the percent discount. The probability a candidate accepts a job offer from JP Morgan Chase & Co. could depend on the salary amount, the level of guaranteed bonuses, and whether or not the offer includes a non-compete clause.

Because it is now a probability that depends on explanatory variables, inference methods are needed to ensure that the probability $0 \leq p \leq 1$ . Logistic regression is a statistical method for describing these kinds of relationships.¹