Methods and formulas for the fits and residuals in Analyze Definitive Screening Design

In This Topic

Fit
Standard error of fitted value (SE Fit)
Residuals
Standardized residual (Std Resid)
Deleted (Studentized) residuals
Confidence interval
Prediction interval

Fit

Notation

Term	Description
	fitted value
x_k	k^th term. Each term can be a single predictor, a polynomial term, or an interaction term.
b_k	estimate of k^th regression coefficient

Standard error of fitted value (SE Fit)

The standard error of the fitted value in a regression model with one predictor is:

The standard error of the fitted value in a regression model with more than one predictor is:

For weighted regression, include the weight matrix in the equation:

When the data have a test data set or K-fold cross validation, the formulas are the same. The value of s² is from the training data. The design matrix and the weight matrix are also from the training data.

Notation

Term	Description
s²	mean square error
n	number of observations
x₀	new value of the predictor
	mean of the predictor
x_i	i^th predictor value
x₀	vector of values that produce the fitted values, one for each column in the design matrix, beginning with a 1 for the constant term
x'₀	transpose of the new vector of predictor values
X	design matrix
W	weight matrix

Residuals

The residual is the difference between an observed value and the corresponding fitted value. This part of the observation is not explained by the model. The residual of an observation is:

Notation

Term	Description
y_i	i^th observed response value
	i^th fitted value for the response

Standardized residual (Std Resid)

Standardized residuals are also called "internally Studentized residuals."

Formula

Notation

Term	Description
e_i	i ^th residual
h_i	i ^th diagonal element of X(X'X)^–1X'
s²	mean square error
X	design matrix
X'	transpose of the design matrix

Deleted (Studentized) residuals

Also called the externally Studentized residuals. The formula is:

Another presentation of this formula is:

The model that estimates the i^th observation omits the i^th observation from the data set. Therefore, the i^th observation cannot influence the estimate. Each deleted residual has a student's t-distribution with degrees of freedom.

Notation

Term	Description
e_i	i^th residual
s_(i)²	mean square error calculated without the i^th observation
h_i	i ^th diagonal element of X(X'X)^–1X'
n	number of observations
p	number of terms, including the constant
SSE	sum of squares for error

Confidence interval

The range in which the estimated mean response for a given set of predictor values is expected to fall.

Formula

Notation

Term	Description

	fitted response value for a given set of predictor values
α	type I error rate
n	number of observations
p	number of model parameters
S ²(b)	variance-covariance matrix of the coefficients
s ²	mean square error
X	design matrix
X₀	vector of given predictor values with 1 column and p rows
X'₀	transpose of the new vector of predictor values with 1 row and p columns

Prediction interval

The prediction interval is the range in which the fitted response for a new observation is expected to fall.

Formula

Notation

Term	Description
s(Pred)
	fitted response value for a given set of predictor values
α	level of significance
n	number of observations
p	number of model parameters
s ²	mean square error
X	predictor matrix
X₀	vector of given predictor values with 1 column and p rows
X'₀	transpose of the new vector of predictor values with 1 row and p columns