16.4 Review of Concepts

Simple Linear Regression

Regression is an expansion of correlation in that it allows us not only to quantify a relation between two variables but also to quantify one variable’s ability to predict another variable. We can predict a dependent variable’s z score from an independent variable’s z score, or we can do a bit more initial work and predict a dependent variable’s raw score from an independent variable’s raw score. The latter method uses the equation for a line with an intercept and a slope.

We use simple linear regression when we predict one dependent variable from one independent variable when the two variables are linearly related. We can graph this line using the regression equation, plugging in low and high values of X and plotting those values with their associated predicted values on Y, then connecting the dots to form the regression line.

Just as we can standardize a raw score by converting it to a z score, we can standardize a slope by converting it to a standardized regression coefficient. This number indicates the predicted change on the dependent variable in terms of standard deviation for every increase of 1 standard deviation in the independent variable. For simple linear regression, the standardized regression coefficient is the same as the Pearson correlation coefficient. Hypothesis testing that determines whether the correlation coefficient is statistically significantly different from 0 also indicates whether the standardized regression coefficient is statistically significantly different from 0.

When we use regression, we must also be aware of the phenomenon called regression to the mean, in which extreme values tend to become less extreme over time.

Interpretation and Prediction

A regression equation is rarely a perfect predictor of scores on the dependent variable. There is always some prediction error, which can be quantified by the standard error of the estimate, the number that describes the typical amount that an observation falls from the regression line. In addition, regression suffers from the same drawbacks as correlation. For example, we cannot know if the predictive relation is causal; the posited direction could be the reverse (with Y causally predicting X), or there could be a third variable at work.

449

When we use regression, we must consider the degree to which an independent variable predicts a dependent variable. To do this, we can calculate the proportionate reduction in error, symbolized as r2. The proportionate reduction in error tells us how much better our prediction is with the regression equation than with the mean as the only predictive tool.

Multiple Regression

We use multiple regression when we have more than one independent variable, as is usual in most research in the behavioral sciences. Multiple regression is particularly useful when we have orthogonal variables, independent variables that make separate contributions to the prediction of a dependent variable. Multiple regression has led to the development of many Web-based prediction tools that allow us to make educated guesses about such outcomes as airplane ticket prices.

Researchers often use one of two types of multiple regression. In stepwise multiple regression, a computer program determines the manner in which the independent variables are entered, using the actual data. In hierarchical multiple regression, the researcher determines the manner in which the independent variables are entered, using the existing research literature as a guide.

A number of more sophisticated statistical analyses, such as path analysis and its more complex counterpart, structural equation modeling (SEM), have been developed in recent years. These techniques allow us to see predictive relations among a number of variables as predicted by a statistical (or theoretical) model. SEM diagrams can be “read” with a basic understanding of a few concepts. Latent variables, represented by large circles, represent the constructs of interest that we cannot directly measure. We operationalize latent variables by measuring several manifest variables, represented by squares, that we believe represent the latent variable. Finally, we look at the numbers above the paths, represented by arrows, to determine the strength and direction of relations between variables.