What is partial least squares regression?

Partial least squares (PLS) regression is a technique that reduces the predictors to a smaller set of uncorrelated components and performs least squares regression on these components, instead of on the original data.

PLS regression is especially useful when your predictors are highly collinear, or when you have more predictors than observations and ordinary least-squares regression either produces coefficients with high standard errors or fails completely. PLS does not assume that the predictors are fixed, unlike multiple regression. This means that the predictors can be measured with error, making PLS more robust to measurement uncertainty.

PLS regression is primarily used in the chemical, drug, food, and plastic industries. A common application is to model the relationship between spectral measurements (NIR, IR, UV), which include many variables that are often correlated with each other, and chemical composition or other physio-chemical properties. In PLS regression, the emphasis is on developing predictive models. Therefore, it is not usually used to screen out variables that are not useful in explaining the response.

To perform PLS, Minitab uses the nonlinear iterative partial least squares (NIPALS) algorithm developed by Herman Wold. The algorithm reduces the number of predictors using a technique similar to principal components analysis to extract a set of components that describes maximum correlation between the predictors and response variables. PLS can calculate as many components as there are predictors; often, cross-validation is used to identify the smaller set of components that provide the greatest predictive ability. If you calculate all possible components, the resulting model is equivalent to the model you would obtain using least squares regression. In PLS, components are selected based on how much variance they explain in the predictors and between the predictors and the response(s). If the predictors are highly correlated, or if a smaller number of components perfectly model the response, then the number of components in the PLS model may be much less than the number of predictors. Minitab then performs least-squares regression on these uncorrelated components.

Unlike least squares regression, PLS can fit multiple response variables in a single model. PLS regression fits multiple response variables in a single model. Because PLS regression models the response variables in a multivariate way, the results can differ significantly from those calculated for the response variables individually. You should model multiple responses separately only if the responses are uncorrelated.