Methods and formulas for fits and residuals in Partial Least Squares Regression

Select the method or formula of your choice.

In This Topic

Fitted values
Cross-validated fitted values
Residuals
Cross-validated residuals
Standardized residual (Std Resid)
Standard error of fitted value (SE Fit)
Confidence interval
Prediction interval

Fitted values

The predicted Y or is the mean response value for the given predictor values using the estimated regression equation.

Cross-validated fitted values

Cross-validated fitted values indicate how well your model predicts data. These values are similar to ordinary fitted values, which indicate how well your model fits the data. To obtain cross-validated fitted value for an observation, it must be removed from the data used to calculate the model and then the fit is calculated with the coefficient vector that is independent from the observation. The formula for the cross-validated fitted values is as follows:

Notation

Term	Description
\i	Indicates that i observation was left out of the model calculation
b_0\i	the intercept for the model that does not include observation i
X	the predictor values
B_{(\i)(j, k)}	the coefficients for the model that does not include observation i

Residuals

The residual is the difference between an observed value and the corresponding fitted value. This part of the observation is not explained by the model. The residual of an observation is:

Notation

Term	Description
y_i	i^th observed response value
	i^th fitted value for the response

Cross-validated residuals

Cross-validated residuals measure the model's predictive ability and are used to calculate the PRESS statistic. Cross-validated residuals in PLS and least squares regression are conceptually similar, but their calculations differ.

Formula

In PLS, the cross-validated residuals are the differences between the actual responses and the cross-validated fitted values.

The cross-validated residual value varies based on how many observations are omitted each time the model is recalculated during cross-validation.

In least squares regression, the cross-validated residuals are calculated directly from the ordinary residuals.

Notation

Term	Description
(i)	observation omitted from the model calculation
y_i	response value
	cross-validated fitted value

Standardized residual (Std Resid)

Standardized residuals are also called "internally Studentized residuals."

Formula

Notation

Term	Description
e_i	i ^th residual
h_i	i ^th diagonal element of X(X'X)^–1X'
s²	mean square error
X	design matrix
X'	transpose of the design matrix

Standard error of fitted value (SE Fit)

The standard error of the fitted value in a regression model with one predictor is:

The standard error of the fitted value in a regression model with more than one predictor is:

For weighted regression, include the weight matrix in the equation:

When the data have a test data set or K-fold cross validation, the formulas are the same. The value of s² is from the training data. The design matrix and the weight matrix are also from the training data.

Notation

Term	Description
s²	mean square error
n	number of observations
x₀	new value of the predictor
	mean of the predictor
x_i	i^th predictor value
x₀	vector of values that produce the fitted values, one for each column in the design matrix, beginning with a 1 for the constant term
x'₀	transpose of the new vector of predictor values
X	design matrix
W	weight matrix

Confidence interval

The confidence interval is the range in which the estimated mean response for a given set of predictor values is expected to fall. The interval is defined by lower and upper limits, which Minitab calculates from the confidence level and the standard error of the fits.

Formula

Notation

Term	Description
α	alpha value
n	number of observations
p	number of predictors

s ²	mean square error
S ²(b)	variance-covariance matrix of the coefficients

Prediction interval

The prediction interval is the range in which the fitted response for a new observation is expected to fall.

Formula

Notation

Term	Description
s(Pred)
	fitted response value for a given set of predictor values
α	level of significance
n	number of observations
p	number of model parameters
s ²	mean square error
X	predictor matrix
X₀	vector of given predictor values with 1 column and p rows
X'₀	transpose of the new vector of predictor values with 1 row and p columns