Example of Partial Least Squares Regression with a test data set

A scientist at a food chemistry laboratory analyzes 60 soybean flour samples. For each sample, the scientist determines the moisture and fat content, and records near-infrared (NIR) spectral data at 88 wavelengths. The scientist randomly selects 54 of the 60 samples and estimates the relationship between the responses (moisture and fat) and the predictors (the 88 NIR wavelengths) using PLS regression. The scientist uses the remaining 6 samples as a test data set to evaluate the predictive ability of the model.

  1. Open the sample data, SoybeanFlour.MTW.
  2. Choose Stat > Regression > Partial Least Squares.
  3. In Responses, enter Moisture Fat.
  4. In Model, enter '1'-'88'.
  5. Click Prediction.
  6. In New observation for continuous predictors, enter Test1-Test88.
  7. In New observation for responses (optional), enter Moisture2 Fat2.
  8. Click OK in each dialog box.

Interpret the results

The p-values for both responses are approximately 0.000, which are less than the significance level of 0.05. These results indicate that at least one coefficient in the model is different from zero. The test R2 value for moisture is approximately 0.9. The test R2 value for fat is almost 0.8. The test R2 statistics indicate that the models predict well. The analysis of each response individually would provide different results.

Method

Cross-validationNone
Components to calculateSet
Number of components calculated10

Analysis of Variance for Moisture

SourceDFSSMSFP
Regression10468.51646.851661.460.000
Residual Error4332.7770.7623   
Total53501.293     

Analysis of Variance for Fat

SourceDFSSMSFP
Regression10266.37826.637836.890.000
Residual Error4331.0500.7221   
Total53297.428     

Model Selection and Validation for Moisture

ComponentsX VarianceErrorR-Sq
10.98497696.92880.806643
20.99640088.99000.822479
30.99775771.93040.856510
40.99942758.31740.883666
50.99972258.12610.884048
60.99985348.52360.903203
70.99996345.98240.908272
80.99997633.15450.933862
90.99998232.80740.934554
100.99998632.77730.934615

Model Selection and Validation for Fat

ComponentsX VarianceErrorR-Sq
10.984976282.5190.050127
20.996400229.9640.226824
30.997757115.9510.610155
40.99942798.2850.669550
50.99972257.9940.805015
60.99985353.0970.821480
70.99996352.0100.825133
80.99997648.8420.835784
90.99998234.3440.884529
100.99998631.0500.895604

Predicted Response for New Observations Using Model for Moisture

RowFitSE Fit95% CI95% PI
114.51840.388841(13.7343, 15.3026)(12.5910, 16.4459)
29.30490.372712(8.5532, 10.0565)(7.3904, 11.2193)
314.17900.504606(13.1614, 15.1966)(12.1454, 16.2127)
416.44770.559704(15.3189, 17.5764)(14.3562, 18.5391)
515.18720.358044(14.4652, 15.9093)(13.2842, 17.0903)
69.46390.485613(8.4846, 10.4433)(7.4492, 11.4787)
Test R-sq: 0.906451

Predicted Response for New Observations Using Model for Fat

RowFitSE Fit95% CI95% PI
118.73720.378459(17.9740, 19.5004)(16.8612, 20.6132)
215.37820.362762(14.6466, 16.1098)(13.5149, 17.2415)
320.78380.491134(19.7933, 21.7743)(18.8044, 22.7632)
414.36840.544761(13.2698, 15.4670)(12.3328, 16.4040)
516.60160.348485(15.8988, 17.3044)(14.7494, 18.4538)
620.74710.472648(19.7939, 21.7003)(18.7861, 22.7080)
Test R-sq: 0.762701