Examining Relationships

63

image

CHAPTER OUTLINE

  • 2.1 Scatterplots
  • 2.2 Correlation
  • 2.3 Least-Squares Regression
  • 2.4 Cautions about Correlation and Regression
  • 2.5 Relations in
    Categorical Data

Introduction

Our topic in this chapter is relationships between two variables. We measure both variables on the same cases. Often, we take the view that one of the variables explains or influences the other.

Statistical summaries of relationships are used to inform decisions in business and economics in many different settings.

  • United Airlines wants to know how well numbers of customers flying different segments this year will predict the numbers for next year.
  • How can Visa use characteristics of potential customers to decide who should receive promotional material?
  • IKEA wants to know how its number of Facebook followers relates to the company's sales. Should it invest in increasing its Facebook presence?

Response Variable, Explanatory Variable

A response variable measures an outcome of a study. An explanatory variable explains or influences changes in a response variable.

independent variable

dependent variable

You will often find explanatory variables called independent variables and response variables called dependent variables. The idea behind this language is that the response variable depends on the explanatory variable. Because the words “independent” and “dependent” have other meanings in statistics that are unrelated to the explanatory–response distinction, we prefer to avoid those words.

It is easiest to identify explanatory and response variables when we actually control the values of one variable to see how it affects another variable.

64

EXAMPLE 2.1 The Best Price?

Price is important to consumers and, therefore, to retailers. Sales of an item typically increase as its price falls, except for some luxury items, where high price suggests exclusivity. The seller's profits for an item often increase as the price is reduced, due to increased sales, until the point at which lower profit per item cancels rising sales. Thus, a retail chain introduces a new TV that can respond to voice commands at several different price points and monitors sales. The chain wants to discover the price at which its profits are greatest. Price is the explanatory variable, and total profit from sales of the TV is the response variable.

When we just observe the values of both variables, there may or may not be explanatory and response variables. Whether there are such variables depends on how we plan to use the data.

EXAMPLE 2.2 Inventory and Sales

Emily is a district manager for a retail chain. She wants to know how the average monthly inventory and monthly sales for the stores in her district are related to each other. Emily doesn't think that either inventory level or sales explains the other. She has two related variables, and neither is an explanatory variable.

Zachary manages another district for the same chain. He asks, “Can I predict a store's monthly sales if I know its inventory level?” Zachary is treating the inventory level as the explanatory variable and the monthly sales as the response variable.

In Example 2.1, price differences actually cause differences in profits from sales of TVs. There is no cause-and-effect relationship between inventory levels and sales in Example 2.2. Because inventory and sales are closely related, we can nonetheless use a store's inventory level to predict its monthly sales. We will learn how to do the prediction in Section 2.3. Prediction requires that we identify an explanatory variable and a response variable. Some other statistical techniques ignore this distinction. Remember that calling one variable “explanatory” and the other “response” doesn't necessarily mean that changes in one cause changes in the other.

image

Most statistical studies examine data on more than one variable. Fortunately, statistical analysis of several-variable data builds on the tools we used to examine individual variables. The principles that guide our work also remain the same:

Apply Your Knowledge

Question 2.1

2.1 Relationship between worker productivity and sleep

A study is designed to examine the relationship between how effectively employees work and how much sleep they get. Think about making a data set for this study.

  1. What are the cases?
  2. Would your data set have a label variable? If yes, describe it.
  3. What are the variables? Are they quantitative or categorical?
  4. Is there an explanatory variable and a response variable? Explain your answer.

2.1

(a) The cases are employees. (b) The label could be the employee's name or ID. (d) The explanatory variable is how much sleep they get; the response is how effectively they work.

65

Question 2.2

2.2 Price versus size

You visit a local Starbucks to buy a Mocha Frappuccino®. The barista explains that this blended coffee beverage comes in three sizes and asks if you want a Tall, a Grande, or a Venti. The prices are $3.75, $4.45, and $4.95, respectively.

  1. What are the variables and cases?
  2. Which variable is the explanatory variable? Which is the response variable? Explain your answers.
  3. The Tall contains 12 ounces of beverage, the Grande contains 16 ounces, and the Venti contains 20 ounces. Answer parts (a) and (b) with ounces in place of the names for the sizes.