SECTION 2.3 Summary
- A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes.
- The most common method of fitting a line to a scatterplot is least squares. The least-squares regression line is the straight line that minimizes the sum of the squares of the vertical distances of the observed points from the line.
- You can use a regression line to predict the value of y for any value of x by substituting this x into the equation of the line.
- The slope b1 of a regression line is the rate at which the predicted response ŷ changes along the line as the explanatory variable x changes. Specifically, b1 is the change in ŷ when x increases by 1.
- The intercept b0 of a regression line is the predicted response ŷ when the explanatory variable x = 0. This prediction is of no statistical use unless x can actually take values near 0.
- The least-squares regression line of y on x is the line with slope and intercept . This line always passes through the point .
- Correlation and regression are closely connected. The correlation r is the slope of the least-squares regression line when we measure both x and y in standardized units. The square of the correlation r2 is the fraction of the variability of the response variable that is explained by the explanatory variable using least-squares regression.
- You can examine the fit of a regression line by studying the residuals, which are the differences between the observed and predicted values of y. Be on the lookout for outlying points with unusually large residuals and also for nonlinear patterns and uneven variation about the line.
- Also look for influential observations, individual points that substantially change the regression line. Influential observations are often outliers in the x direction, but they need not have large residuals.