The objective with PLS is to select a model with the appropriate number of components that has good predictive ability. When you fit a PLS model, you can perform cross-validation to help you determine the optimal number of components in the model. With cross-validation, Minitab selects the model with the highest predicted R^{2} value. If you do not use cross-validation, you can specify the number of components to include in the model or use the default number of components. The default number of components is 10 or the number of predictors in your data, whichever is less. Examine the Method table to determine how many components Minitab included in the model. You can also examine the Model selection plot.

When using PLS, select a model with the smallest number of components that explain a sufficient amount of variability in the predictors and the responses. To determine the number of components that is best for your data, examine the Model selection table, including the X-variance, R^{2}, and predicted R^{2} values. Predicted R^{2} indicates the predictive ability of the model and is only displayed if you perform cross-validation.

In some cases, you may decide to use a different model than the one initially selected by Minitab. If you used cross-validation, compare the R^{2 }and predicted R^{2}. Consider an example where removing two components from the model that Minitab only slightly decreases predicted R^{2}. Because the predicted R^{2} only decreased slightly, the model is not overfit and you may decide it better suits your data.

A predicted R^{2} that is substantially less than R^{2} may indicate that the model is over-fit. An over-fit model occurs when you add terms or components for effects that are not important in the population, although they may appear important in the sample data. The model becomes tailored to the sample data and, therefore, may not be useful for making predictions about the population.

If you do not use cross-validation, you can examine the x-variance values in the Model selection table to determine how much variance in the response is explained by each model.

Cross-validation | Leave-one-out |
---|---|

Components to evaluate | Set |

Number of components evaluated | 10 |

Number of components selected | 4 |

Cross-validation | None |
---|---|

Components to calculate | Set |

Number of components calculated | 10 |

In these results, in the first Method table cross-validation was used and selected the model with 4 components. In the second Method table, cross-validation was not used. Minitab uses the model with 10 components, which is the default.

Components | X Variance | Error | R-Sq | PRESS | R-Sq (pred) |
---|---|---|---|---|---|

1 | 0.158849 | 14.9389 | 0.637435 | 23.3439 | 0.433444 |

2 | 0.442267 | 12.2966 | 0.701564 | 21.0936 | 0.488060 |

3 | 0.522977 | 7.9761 | 0.806420 | 19.6136 | 0.523978 |

4 | 0.594546 | 6.6519 | 0.838559 | 18.1683 | 0.559056 |

5 | 5.8530 | 0.857948 | 19.2675 | 0.532379 | |

6 | 5.0123 | 0.878352 | 22.3739 | 0.456988 | |

7 | 4.3109 | 0.895374 | 24.0041 | 0.417421 | |

8 | 4.0866 | 0.900818 | 24.7736 | 0.398747 | |

9 | 3.5886 | 0.912904 | 24.9090 | 0.395460 | |

10 | 3.2750 | 0.920516 | 24.8293 | 0.397395 |

In these results, Minitab selected the 4-component model which has a predicted
R^{2} value of approximately 56%. Based on the x-variance, the
4-component model explains almost 60% of the variance in the predictors. As the
number of components increases, the R^{2} value increases, but the predicted
R^{2} decreases, which indicates that models with more components are
likely to be over-fit.

To determine whether your model fits the data well, you need to examine plots to look for outliers, leverage points, and other patterns. If your data contain many outliers or leverage points, the model may not make valid predictions.

You can examine the residual plots, including the residuals vs leverage plot. On the residuals vs leverage plot, look for the following:

- Outliers: Observations with large standardized residuals fall outside the horizontal reference lines on the plot.
- Leverage points: Observations with leverage values have x-scores far from zero and are to the right of the vertical reference line.

For more information on the residual vs leverage plot, go to Graphs for Partial Least Squares Regression.

In this plot, there are two points that may be leverage points because they are to the right of
the vertical line. There are three points that may be outliers because they are above
and below the horizontal reference lines. These points can be investigated to determine
how they affect the model fit.

You can also examine the Response plot to determine how well the model fits and predicts each observation. When examining this plot, look for the following things:

- A nonlinear pattern in the points, which indicates the model may not fit or predict data well.
- If you perform cross-validation, large differences in the fitted and the cross-validated values, which indicate a leverage point.

In this plot, the points generally follow a linear pattern, indicating that the model fits the
data well. The points that appear on the residual vs leverage plot above do not seem to
be an issue on this plot.

In this plot, cross-validation was used so both the fitted and cross-validated fitted values
appear on the plot. The plot does not reveal large differences between the fitted and
cross-validated fitted responses.

Often, PLS regression is performed in two steps. The first step, sometimes called training, involves calculating a PLS regression model for a sample data set (also called a training data set). The second step involves validating this model with a different set of data, often called a test data set. To validate the model with the test data set, enter the columns of the test data in the Prediction sub-dialog box. Minitab calculates new response values for each observation in the test data set and compares the predicted response to the actual response. Based on the comparison, Minitab calculates the test R^{2}, which indicates the model's ability to predict new responses. Higher test R ^{2} values indicate the model has greater predictive ability.

If you use cross-validation, compare the test R^{2} to the predicted R^{2}. Ideally, these values should be similar. A test R^{2} that is significantly smaller than the predicted R^{2} indicates that cross-validation is overly optimistic about the model's predictive ability or that the two data samples are from different populations.

If the test data set does not include response values, then Minitab does not calculate a test R^{2}.

Row | Fit | SE Fit | 95% CI | 95% PI |
---|---|---|---|---|

1 | 18.7372 | 0.378459 | (17.9740, 19.5004) | (16.8612, 20.6132) |

2 | 15.3782 | 0.362762 | (14.6466, 16.1098) | (13.5149, 17.2415) |

3 | 20.7838 | 0.491134 | (19.7933, 21.7743) | (18.8044, 22.7632) |

4 | 14.3684 | 0.544761 | (13.2698, 15.4670) | (12.3328, 16.4040) |

5 | 16.6016 | 0.348485 | (15.8988, 17.3044) | (14.7494, 18.4538) |

6 | 20.7471 | 0.472648 | (19.7939, 21.7003) | (18.7861, 22.7080) |

In these results, the test R^{2} is approximately 76%. The predicted
R^{2} for the original data set is approximately 78%. Because these
values are similar, you can conclude that the model has adequate predictive
ability.