SECTION 2.3 Summary
- A regression line is a straight line that describes how a response variable changes as an explanatory variable changes.
- The most common method of fitting a line to a scatterplot is least squares. The least-squares regression line is the straight line that minimizes the sum of the squares of the vertical distances of the observed points from the line.
- You can use a regression line to predict the value of for any value of by substituting this into the equation of the line.
- The slope of a regression line is the rate at which the predicted response changes along the line as the explanatory variable changes. Specifically, is the change in when increases by 1.
- The intercept of a regression line is the predicted response when the explanatory variable . This prediction is of no statistical use unless can actually take values near 0.
- The least-squares regression line of on is the line with slope and intercept . This line always passes through the point .
- Correlation and regression are closely connected. The correlation is the slope of the least-squares regression line when we measure both and in standardized units. The square of the correlation is the fraction of the variability of the response variable that is explained by the explanatory variable using least-squares regression.
- You can examine the fit of a regression line by studying the residuals, which are the differences between the observed and predicted values of . Be on the lookout for outlying points with unusually large residuals and also for nonlinear patterns and uneven variation about the line.
- Also look for influential observations, individual points that substantially change the regression line. Influential observations are often outliers in the direction, but they need not have large residuals.