One of the most common uses of statistical methods in business and economics is to predict or forecast a response based on one or several explanatory variables. Here are some examples:
Prediction is most straightforward when there is a straight-line relationship between a quantitative response variable and a single quantitative explanatory variable. This is simple linear regression, the topic of this chapter. In Chapter 11, we discuss regression when there is more than one explanatory variable.
simple linear regression
As we saw in Chapter 2, when a scatterplot shows a linear relationship between a quantitative explanatory variable x and a quantitative response variable y, we can use the least-squares line to predict y for a given value of x. Now we want to do tests and confidence intervals in this setting.
To do this, we will think of the least-squares line, b0+b1x, as an estimate of a regression line for the population, just as in Chapter 7 where we viewed the sample mean ˉx as the estimate of the population mean μ. We write the population regression line as β0+β1x. The numbers β0 and β1 are parameters that describe the population. The numbers b0 and b1 are statistics calculated from a sample. The intercept b0 estimates the intercept of the population line β0, and the fitted slope b1 estimates the slope of the population line β1.
We can give confidence intervals and significance tests for inference about the slope β1 and the intercept β0. Because regression lines are most often used for prediction, we also consider inference about either the mean response or an individual future observation on y for a given value of the explanatory variable x. Finally, we discuss statistical inference about the correlation between two variables x and y.