Topic Library / Regression and correlation

# What is partial least squares regression?

Partial least squares (PLS) regression is a technique that reduces the predictors to a smaller set of uncorrelated components and performs least squares regression on these components, instead of on the original data. PLS regression is especially useful when your predictors are highly collinear, or when you have more predictors than observations and ordinary least-squares regression either produces coefficients with high standard errors or fails completely.

PLS regression is primarily used in the chemical, drug, food, and plastic industries. A common application is to model the relationship between spectral measurements (NIR, IR, UV), which include many variables that are often correlated with each other, and chemical composition or other physio-chemical properties. In PLS regression, the emphasis is on developing predictive models. Therefore, it is not usually used to screen out variables that are not useful in explaining the response.

Minitab uses the nonlinear iterative partial least squares (NIPALS) algorithm developed by Herman Wold. The algorithm reduces the number of predictors using a technique similar to principal components analysis to extract a set of components that describes maximum correlation between the predictors and response variables. If the predictors are highly correlated, or if a smaller number of components perfectly model the response, then the number of components in the model might be much less than the number of predictors. Minitab then performs least-squares regression on these uncorrelated components. Also, cross-validation is often used to select the components that maximize the model's predictive ability.

PLS regression fits multiple response variables in a single model. Because PLS regression models the response variables in a multivariate way, the results can differ significantly from those calculated for the response variables individually. You should model multiple responses in a single PLS regression model only when they are correlated.