Example of Partial Least Squares Regression with cross validation

A wine producer wants to know how the chemical composition of his wine relates to sensory evaluations. He has 37 Pinot Noir samples, each described by 17 elemental concentrations (Cd, Mo, Mn, Ni, Cu, Al, Ba, Cr, Sr, Pb, B, Mg, Si, Na, Ca, P, K) and a score on the wine's aroma from a panel of judges. He wants to predict the aroma score from the 17 elements. Data are from: I.E. Frank and B.R. Kowalski (1984). "Prediction of Wine Quality and Geographic Origin from Chemical Measurements by Partial Least-Squares Regression Modeling," Analytica Chimica Acta, 162, 241 − 251.

The producer wants to include all the concentrations and all the 2-way interactions that include cadmium (Cd) in the model. Because the ratio of samples to predictors is low, the producer decides to use partial least squares regression.

  1. Open the sample data WineAroma.MTW.
  2. Choose Stat > Regression > Partial Least Squares.
  3. In Responses, enter Aroma.
  4. In Model, enter Cd-K Cd*Mo Cd*Mn Cd*Ni Cd*Cu Cd*Al Cd*Ba Cd*Cr Cd*Sr Cd*Pb Cd*B Cd*Mg Cd*Si Cd*Na Cd*Ca Cd*P Cd*K.
  5. Click Options.
  6. Under Cross-Validation, select Leave-one-out. Click OK.
  7. Click Graphs. Select Model selection plot. Deselect Response plot and Coefficient plot.
  8. Click OK in each dialog box.

Interpret the results

The model selection plot identifies the model with 4 components as the optimal model because the 4-component model has the highest predicted R2 value. The predicted R2 values on the plot are calculated with cross-validation. The model selection and validation table shows that the predicted R2 value for the optimal model is approximately 0.56. Minitab uses the optimal model for the analysis of variance calculations. The optimal model is statistically significant at the 0.05 level of significance because the p-value is approximately 0.000.

Method

Cross-validationLeave-one-out
Components to evaluateSet
Number of components evaluated10
Number of components selected4

Analysis of Variance for Aroma

SourceDFSSMSFP
Regression434.55148.6378441.550.000
Residual Error326.65190.20787   
Total3641.2032     

Model Selection and Validation for Aroma

ComponentsX VarianceErrorR-SqPRESSR-Sq (pred)
10.15884914.93890.63743523.34390.433444
20.44226712.29660.70156421.09360.488060
30.5229777.97610.80642019.61360.523978
40.5945466.65190.83855918.16830.559056
5  5.85300.85794819.26750.532379
6  5.01230.87835222.37390.456988
7  4.31090.89537424.00410.417421
8  4.08660.90081824.77360.398747
9  3.58860.91290424.90900.395460
10  3.27500.92051624.82930.397395