Methods and formulas for fits and residuals in Fit Regression Model and Linear Regression

Select the method or formula of your choice.

Fit

Notation

TermDescription
fitted value
xkkth term. Each term can be a single predictor, a polynomial term, or an interaction term.
bkestimate of kth regression coefficient

Standard error of fitted value (SE Fit)

The standard error of the fitted value in a regression model with one predictor is:

The standard error of the fitted value in a regression model with more than one predictor is:

For weighted regression, include the weight matrix in the equation:

When the data have a test data set or K-fold cross validation, the formulas are the same. The value of s2 is from the training data. The design matrix and the weight matrix are also from the training data.

Notation

TermDescription
s2mean square error
nnumber of observations
x0new value of the predictor
mean of the predictor
xiith predictor value
x0 vector of values that produce the fitted values, one for each column in the design matrix, beginning with a 1 for the constant term
x'0transpose of the new vector of predictor values
Xdesign matrix
Wweight matrix

Confidence interval for a fitted value (CI)

Formula

For regression, the following formula gives the confidence bounds for a fitted value:

For weighted regression, the formula includes the weights:

where tv is the 1–α/2 quantile of the t distribution with v degrees of freedom for a two-sided interval. For a 1-sided bound, tv is the 1–α quantile of the t distribution with v degrees of freedom.

When you use a test data set or k-fold cross-validation, the degrees of freedom and the mean square error are from the training data set.

When you use a Box-Cox transformation, apply the inverse transformation to the confidence interval formula to find the bounds in the units of the original response. For example, if the Box-Cox transformation is the natural log, then the following formula gives the inverse transformation:

Notation

TermDescription
fitted value
quantile from the t distribution
degrees of freedom
mean square error
leverage for the ith observation
wiweight for the ith observation

Residuals

The residual is the difference between an observed value and the corresponding fitted value. This part of the observation is not explained by the model. The residual of an observation is:

Notation

TermDescription
yiith observed response value
ith fitted value for the response

Standardized residual (Std Resid)

Standardized residuals are also called "internally Studentized residuals."

Formula

Notation

TermDescription
ei i th residual
hi i th diagonal element of X(X'X)–1X'
s2 mean square error
Xdesign matrix
X'transpose of the design matrix

Standardized residual (Std Resid) with validation

For validation data, the denominator of the formula for the standardized residual adds the leverage instead of subtracting the leverage.

Formula

For weighted regression, the formula includes the weight:

Notation

TermDescription
ei i th residual in the validation data set
hi leverage for the ith validation row
s2 mean square error for the training data set
wiweight for the ith observation in the validation data set

Deleted (Studentized) residuals

Also called the externally Studentized residuals. The formula is:

Another presentation of this formula is:

The model that estimates the ith observation omits the ith observation from the data set. Therefore, the ith observation cannot influence the estimate. Each deleted residual has a student's t-distribution with degrees of freedom.

Notation

TermDescription
eiith residual
s(i)2mean square error calculated without the ith observation
hi i th diagonal element of X(X'X)–1X'
nnumber of observations
pnumber of terms, including the constant
SSEsum of squares for error