Methods and formulas for model selection in Partial Least Squares Regression

Select the method or formula of your choice.

In This Topic

R-sq
SS
PRESS
R-sq (pred)
Test R-sq

R-sq

R² is also known as the coefficient of determination.

Formula

Notation

Term	Description
y_i	i ^th observed response value
	mean response
	i ^th fitted response

SS

The sum of the squared distances. SS Regression is the portion of the variation explained by the model. SS Error is the portion not explained by the model and is attributed to error. SS Total is the total variation in the data.

Formula

SS Regression:

SS Error:

SS Total:

Notation

Term	Description
y_i	i ^th observed response value
	i ^th fitted response
	mean response

PRESS

The prediction sum of squares (PRESS) statistic assesses your model's predictive ability. PRESS, similar to the residual sum of squares, is the sum of squares of the prediction error. In PLS, Minitab only calculates PRESS if you cross-validated the model.

Minitab calculates PRESS in the following steps:

Minitab recalculates the model as many times as there are observations, omitting a different observation each time. For each omitted observation, Minitab calculates the fitted or predicted response using the model.
Minitab subtracts the predicted value from the observed response value. This is the true prediction error because the observation fit is independent of the model.
Once Minitab conducts this routine for all observations, Minitab calculates PRESS using the formula:

In general, the smaller the PRESS value, the better the model's predictive ability. PRESS is used to calculate the predicted R².

Notation

Term	Description
y_i	the observed response
	the fitted response for the omitted observation
n	the number of observations

R-sq (pred)

While the calculations for R²(pred) can produce negative values, Minitab displays zero for these cases.

Notation

Term	Description
y_i	i ^th observed response value
	mean response
n	number of observations
e_i	i ^th residual
h_i	i ^th diagonal element of X(X'X)^–1X'
X	design matrix

Test R-sq

Indicates how well the PLS model predicts your test data. The test R² represents the proportion of variation in the responses that is explained by the predictors in your test data set. Generally, test data is used to validate the fitted model and must include the same number of predictors as the original data set. The test R² can only be calculated if the test data includes response data for each observation. The test R² is calculated in the same way as R²with this formula: