Methods and formulas for model information in Partial Least Squares Regression

Select the method or formula of your choice.

Coefficients and standardized coefficients

Coefficients are the parameters in a regression equation. The estimated coefficients are used with the predictors to calculate the fitted value of the response variable and the predicted response of new observations. In contrast to least squares, the PLS coefficients are nonlinear estimators. Standardized coefficients indicate the importance of each predictor in the model and correspond to the standardized x- and y-variables. In PLS, the coefficient matrix (dimension p × r) is calculated from the weights and loadings.

The formula for standardized coefficients is:

To calculate the nonstandardized coefficients and intercept, use these formulas:

Notation

TermDescription
Wthe x-weight matrix
Pthe x-loading matrix
Cthe y-loading matrix
jthe predictors (1, p)
kthe responses (1, r)
pthe number of predictors
rthe number of responses

Leverages

In least squares regression, leverages are values that indicate how far the corresponding observations are from the center of the x-space, which is described by the x-values. In PLS, the predictors are replaced by x-scores. Observations with high leverage have x-scores far from zero and have a significant influence on the regression coefficients. Points with high leverage are outliers in the x-space, but are not necessarily outliers in the y-space.

The leverage values in PLS are calculated from the x-score matrix T, which is used to calculate the hat matrix (H) as follows:

The leverage (hii) of the ith observation is the ith diagonal element of the H matrix.

A leverage value greater than 2m / n is considered high and should be examined.

Notation

TermDescription
nthe number of observations
mthe number of components

Distances from the x-model

A measure of how well observations are fitted in the x-space; indicates how well the x-scores describe observations. An observation with a large distance may also be a leverage point.

Formula

The formula for calculating the distance from the x-model for the ith observation is:

Notation

TermDescription
Mnumber of components
tx-score
pnumber of predictors

Distances from the y-model

A measure of how well observations are fitted in the y-space; indicates how well the y-scores describe observations. An observation with a large distance may also be an outlier.

Formula

The formula for calculating the distance from the y-model for the ith observation follows:

Notation

TermDescription
Mthe number of components
uthe y-score
rthe number of responses