Interpret the key results for Partial Least Squares Regression

Step 1. Determine the number of components in the model

The objective with PLS is to select a model with the appropriate number of components that has good predictive ability. When you fit a PLS model, you can perform cross-validation to help you determine the optimal number of components in the model. With cross-validation, Minitab selects the model with the highest predicted R2 value. If you do not use cross-validation, you can specify the number of components to include in the model or use the default number of components. The default number of components is 10 or the number of predictors in your data, whichever is less. Examine the Method table to determine how many components Minitab included in the model. You can also examine the Model selection plot.

When using PLS, select a model with the smallest number of components that explain a sufficient amount of variability in the predictors and the responses. To determine the number of components that is best for your data, examine the Model selection table, including the X-variance, R2, and predicted R2 values. Predicted R2 indicates the predictive ability of the model and is only displayed if you perform cross-validation.

In some cases, you may decide to use a different model than the one initially selected by Minitab. If you used cross-validation, compare the R2 and predicted R2. Consider an example where removing two components from the model that Minitab only slightly decreases predicted R2. Because the predicted R2 only decreased slightly, the model is not overfit and you may decide it better suits your data.

A predicted R2 that is substantially less than R2 may indicate that the model is over-fit. An over-fit model occurs when you add terms or components for effects that are not important in the population, although they may appear important in the sample data. The model becomes tailored to the sample data and, therefore, may not be useful for making predictions about the population.

If you do not use cross-validation, you can examine the x-variance values in the Model selection table to determine how much variance in the response is explained by each model.

Method

Cross-validationLeave-one-out
Components to evaluateSet
Number of components evaluated10
Number of components selected4

Method

Cross-validationNone
Components to calculateSet
Number of components calculated10
Key Result: Number of components

In these results, in the first Method table cross-validation was used and selected the model with 4 components. In the second Method table, cross-validation was not used. Minitab uses the model with 10 components, which is the default.

Model Selection and Validation for Aroma

ComponentsX VarianceErrorR-SqPRESSR-Sq (pred)
10.15884914.93890.63743523.34390.433444
20.44226712.29660.70156421.09360.488060
30.5229777.97610.80642019.61360.523978
40.5945466.65190.83855918.16830.559056
5  5.85300.85794819.26750.532379
6  5.01230.87835222.37390.456988
7  4.31090.89537424.00410.417421
8  4.08660.90081824.77360.398747
9  3.58860.91290424.90900.395460
10  3.27500.92051624.82930.397395
Key Result: X Variance, R-sq, R-sq (pred)

In these results, Minitab selected the 4-component model which has a predicted R2 value of approximately 56%. Based on the x-variance, the 4-component model explains almost 60% of the variance in the predictors. As the number of components increases, the R2 value increases, but the predicted R2 decreases, which indicates that models with more components are likely to be over-fit.

Step 2. Determine whether the data contain outliers or leverage points

To determine whether your model fits the data well, you need to examine plots to look for outliers, leverage points, and other patterns. If your data contain many outliers or leverage points, the model may not make valid predictions.

You can examine the residual plots, including the residuals vs leverage plot. On the residuals vs leverage plot, look for the following:
  • Outliers: Observations with large standardized residuals fall outside the horizontal reference lines on the plot.
  • Leverage points: Observations with leverage values have x-scores far from zero and are to the right of the vertical reference line.

For more information on the residual vs leverage plot, go to Graphs for Partial Least Squares Regression.

In this plot, there are two points that may be leverage points because they are to the right of the vertical line. There are three points that may be outliers because they are above and below the horizontal reference lines. These points can be investigated to determine how they affect the model fit.
You can also examine the Response plot to determine how well the model fits and predicts each observation. When examining this plot, look for the following things:
  • A nonlinear pattern in the points, which indicates the model may not fit or predict data well.
  • If you perform cross-validation, large differences in the fitted and the cross-validated values, which indicate a leverage point.
In this plot, the points generally follow a linear pattern, indicating that the model fits the data well. The points that appear on the residual vs leverage plot above do not seem to be an issue on this plot.
In this plot, cross-validation was used so both the fitted and cross-validated fitted values appear on the plot. The plot does not reveal large differences between the fitted and cross-validated fitted responses.

Step 3. Validate the PLS model with a test data set

Often, PLS regression is performed in two steps. The first step, sometimes called training, involves calculating a PLS regression model for a sample data set (also called a training data set). The second step involves validating this model with a different set of data, often called a test data set. To validate the model with the test data set, enter the columns of the test data in the Prediction sub-dialog box. Minitab calculates new response values for each observation in the test data set and compares the predicted response to the actual response. Based on the comparison, Minitab calculates the test R2, which indicates the model's ability to predict new responses. Higher test R 2 values indicate the model has greater predictive ability.

If you use cross-validation, compare the test R2 to the predicted R2. Ideally, these values should be similar. A test R2 that is significantly smaller than the predicted R2 indicates that cross-validation is overly optimistic about the model's predictive ability or that the two data samples are from different populations.

If the test data set does not include response values, then Minitab does not calculate a test R2.

Predicted Response for New Observations Using Model for Fat

RowFitSE Fit95% CI95% PI
118.73720.378459(17.9740, 19.5004)(16.8612, 20.6132)
215.37820.362762(14.6466, 16.1098)(13.5149, 17.2415)
320.78380.491134(19.7933, 21.7743)(18.8044, 22.7632)
414.36840.544761(13.2698, 15.4670)(12.3328, 16.4040)
516.60160.348485(15.8988, 17.3044)(14.7494, 18.4538)
620.74710.472648(19.7939, 21.7003)(18.7861, 22.7080)
Test R-sq: 0.762701
Key Result: Test R 2

In these results, the test R2 is approximately 76%. The predicted R2 for the original data set is approximately 78%. Because these values are similar, you can conclude that the model has adequate predictive ability.