• Data for multiple linear regression consist of the values of a response variable y and p explanatory variables x1, x2, … , xp for n cases. We write the data and enter them into software in the form
Variables | |||||
---|---|---|---|---|---|
Individual | y | x1 | x2 | … | xp |
1 | y1 | x11 | x12 | … | x1p |
2 | y2 | x21 | x22 | … | x2p |
n | yn | xn1 | xn2 | … |
• The statistical model for multiple linear regression with response variable y and p explanatory variables x1, x2, … , xp is
where i = 1, 2, … , n. The ϵi are assumed to be independent and Normally distributed with mean 0 and standard deviation σ. The parameters of the model are β0, β1, β2, … , βp, and σ.
• The multiple regression equation predicts the response variable by a linear relationship with all the explanatory variables:
616
• The β’s are estimated by b0, b1, b2, … , bp, which are obtained by the method of least squares. The model standard deviation σ is estimated by
where the ei are the residuals,
• A level C confidence interval for βj is
where t* is the value for the t(n − p − 1) density curve with area C between −t* and t*.
• The test of the hypothesis H0: βj = 0 is based on the t statistic
and the t(n − p − 1) distribution.
• The ANOVA table for a multiple linear regression gives the degrees of freedom; sum of squares; and mean squares for the model, error, and total sources of variation. The ANOVA F statistic is the ratio MSM/MSE and is used to test the null hypothesis
H0: β1 = β2 = … = βp = 0
If H0 is true, this statistic has an F(p, n − p − 1) distribution.
• The squared multiple correlation is given by the expression
and is interpreted as the proportion of the variability in the response variable y that is explained by the explanatory variables x1, x2, … , xp in the multiple linear regression.