A regression line is a straight line that describes how a response variabley changes as an explanatory variablex changes.
The most common method of fitting a line to a scatterplot is least squares. The least-squares regression line is the straight line ŷ=b0+b1x that minimizes the sum of the squares of the vertical distances of the observed points from the line.
You can use a regression line to predict the value of y for any value of x by substituting this x into the equation of the line.
The slopeb1 of a regression line ŷ=b0+b1x is the rate at which the predicted response ˆy changes along the line as the explanatory variablex changes. Specifically, b1 is the change in ˆy when x increases by 1.
The interceptb0 of a regression line ŷ=b0+b1x is the predicted response ˆy when the explanatory variable x=0. This prediction is of no statistical use unless x can actually take values near 0.
The least-squares regression line of y on x is the line with slope b1=rsy/sx and intercept b0=ˉy−b1ˉx. This line always passes through the point (ˉx,ˉy).
Correlation and regression are closely connected. The correlation r is the slope of the least-squares regression line when we measure both x and y in standardized units. The square of the correlationr2 is the fraction of the variability of the response variable that is explained by the explanatory variable using least-squares regression.
You can examine the fit of a regression line by studying the residuals, which are the differences between the observed and predicted values of y. Be on the lookout for outlying points with unusually large residuals and also for nonlinear patterns and uneven variation about the line.
Also look for influential observations, individual points that substantially change the regression line. Influential observations are often outliers in the x direction, but they need not have large residuals.