Predicted R2 indicates how well each calculated model predicts the response and is only calculated when you perform cross-validation. If one response variable is in the data, Minitab selects the PLS model with the highest predicted R2. If multiple response variables are in the data, Minitab selects the PLS model with the highest mean predicted R2 for all of the response variables. Predicted R2 is calculated by systematically removing each observation from the data set, estimating the regression equation, and determining how well the model predicts the removed observation. The value of predicted R2 ranges between 0% and 100%. (While the calculations for predicted R2 can produce negative values, Minitab displays zero for these cases.)
Use predicted R2 to determine how well your model predicts the response for new observations. Models that have larger predicted R2 values have better predictive ability.
A predicted R2 that is substantially less than R2 may indicate that the model is over-fit. An over-fit model occurs when you add terms for effects that are not important in the population. The model becomes tailored to the sample data and, therefore, may not be useful for making predictions about the population.
To determine the whether the model selected by cross-validation is most appropriate, examine the R2 and predicted R2 values. In some cases, you may decide to use a different model than the one selected by cross-validation. Consider an example where adding two components to the model that Minitab selects significantly increases R2 and only slightly decreases the predicted R2. Because the predicted R2 only decreased slightly, the model is not overfit and you may decide it better suits your data.