The statistical model for multiple linear regression with response variable y and p explanatory variables x1 , x2 , …, xp is
yi=β0+β1xi1+β2xi2+⋯+βpxip+εi
where i = 1, 2, …, n. The deviations εi are independent Normal random variables with mean 0 and a common standard deviation σ. The parameters of the model are β0, β1, β2, …,βp, and σ.
The β’s are estimated by the coefficients b0, b1, b2, …, bp of the multiple regression equation fitted to the data by the method of least squares. The parameter σ is estimated by the regression standard error
s=√MSE=√Σe2in−p−1
where the ei are the residuals,
ei=yi−ˆyi
A level C confidence interval for the regression coefficient βj is
bj±t*SEbj
where t* is the value for the t(n−p−1) density curve with area C between −t* and t*.
Tests of the hypothesis H0:βj=0 are based on the individual t statistic:
t=bjSEbj
and the t(n−p−1) distribution.
The ANOVA table for a multiple linear regression gives the degrees of freedom, sum of squares, and mean squares for the regression and residual sources of variation. The ANOVA F statistic is the ratio MSR/MSE and is used to test the null hypothesis
H0:β1=β2=⋯=βp=0
If H0 is true, this statistic has the F(p,n−p−1) distribution.
The squared multiple correlation is given by the expression
R2=SSRSST
and is interpreted as the proportion of the variability in the response variable y that is explained by the explanatory variables x1 , x2 , …, xp in the multiple linear regression.
F=(n−p−1q)(R21−R221−R21)