• The statistical model for simple linear regression assumes that the means of the response variable y fall on a line when plotted against x, with the observed y’s varying Normally about these means. For n observations, this model can be written
where i = 1, 2, . . . , n, and the ϵi are assumed to be independent and Normally distributed with mean 0 and standard deviation σ. Here β0 + β1xi is the mean response when x = xi. The parameters of the model are β0, β1, and σ.
• The population regression line intercept and slope, β0 and β1, are estimated by the intercept and slope of the least-
where the ei are the residuals
• Prior to inference, always examine the residuals for Normality, constant variance, and any other remaining patterns in the data. Plots of the residuals both against the case number and against the explanatory variable are commonly part of this examination. Scatterplot smoothers are helpful in detecting patterns in these plots.
578
• A level C confidence interval for β1 is
where t* is the value for the t(n − 2) density curve with area C between −t* and t*.
• The test of the hypothesis H0: β1 = 0 is based on the t statistic
and the t(n − 2) distribution. This tests whether there is a straight-
• The estimated mean response for the subpopulation corresponding to the value x* of the explanatory variable is
• A level C confidence interval for the mean response is
where t* is the value for the t(n − 2) density curve with area C between −t* and t*.
• The estimated value of the response variable y for a future observation from the subpopulation corresponding to the value x* of the explanatory variable is
• A level C prediction interval for the estimated response is
where t* is the value for the t(n − 2) density curve with area C between −t* and t*. The standard error for the prediction interval is larger than the confidence interval because it also includes the variability of the future observation around its subpopulation mean.
• Sometimes, a transformation of one or both of the variables can make their relationship linear. However, these transformations can harm the assumptions of Normality and constant variance, so it is important to examine the residuals.