Our topic in this chapter is relationships between two variables. We measure both variables on the same cases. Often, we take the view that one of the variables explains or influences the other.
Statistical summaries of relationships are used to inform decisions in business and economics in many different settings.
A response variable measures an outcome of a study. An explanatory variable explains or influences changes in a response variable.
You will often find explanatory variables called independent variables and response variables called dependent variables. The idea behind this language is that the response variable depends on the explanatory variable. Because the words “independent” and “dependent” have other meanings in statistics that are unrelated to the explanatory–response distinction, we prefer to avoid those words.
It is easiest to identify explanatory and response variables when we actually control the values of one variable to see how it affects another variable.
Price is important to consumers and, therefore, to retailers. Sales of an item typically increase as its price falls, except for some luxury items, where high price suggests exclusivity. The seller’s profits for an item often increase as the price is reduced, due to increased sales, until the point at which lower profit per item cancels rising sales. Thus, a retail chain introduces a new TV that can respond to voice commands at several different price points and monitors sales. The chain wants to discover the price at which its profits are greatest. Price is the explanatory variable, and total profit from sales of the TV is the response variable.
When we just observe the values of both variables, there may or may not be explanatory and response variables. Whether there are such variables depends on how we plan to use the data.
Emily is a district manager for a retail chain. She wants to know how the average monthly inventory and monthly sales for the stores in her district are related to each other. Emily doesn’t think that either inventory level or sales explains the other. She has two related variables, and neither is an explanatory variable.
Zachary manages another district for the same chain. He asks, “Can I predict a store’s monthly sales if I know its inventory level?” Zachary is treating the inventory level as the explanatory variable and the monthly sales as the response variable.
In Example 2.1, price differences actually cause differences in profits from sales of TVs. There is no cause-and-effect relationship between inventory levels and sales in Example 2.2. Because inventory and sales are closely related, we can nonetheless use a store’s inventory level to predict its monthly sales. We will learn how to do the prediction in Section 2.3. Prediction requires that we identify an explanatory variable and a response variable. Some other statistical techniques ignore this distinction. Remember that calling one variable “explanatory” and the other “response” doesn’t necessarily mean that changes in one cause changes in the other.
Most statistical studies examine data on more than one variable. Fortunately, statistical analysis of several-variable data builds on the tools we used to examine individual variables. The principles that guide our work also remain the same:
A study is designed to examine the relationship between how effectively employees work and how much sleep they get. Think about making a data set for this study.
You visit a local Starbucks to buy a Mocha Frappuccino®. The barista explains that this blended coffee beverage comes in three sizes and asks if you want a Tall, a Grande, or a Venti. The prices are $3.75, $4.45, and $4.95, respectively.