The model selection plot is a scatterplot of the R2 and predicted R2 values as a function of the number of components that are fit or cross-validated. It is a graphical display of the Model Selection and Validation table. If you do not use cross-validation, the predicted R2 values do not appear on your plot. Minitab provides one model selection plot per response.
Use this plot to compare the modeling and predicting power of different models to determine the appropriate number of components to retain in your model. The vertical line on the plot indicates the number of components Minitab selected for the PLS model.
The response plot is a scatterplot of the fitted values versus the actual responses. If you perform cross-validation, the plot also includes the fitted values versus the cross-validated fitted values. Minitab provides one response plot per response.
A model with excellent predictive capability usually has a slope of 1 and intersects the y-axis at 0.
The coefficient plot is a projected scatterplot showing the unstandardized coefficients for each predictor. Minitab provides one coefficient plot per response.
Use the coefficient plot, along with the output of regression coefficients to compare the sign and magnitude of the coefficients for each predictor. The plot makes it easier to quickly identify predictors that are more or less important in the model.
Because the plot displays unstandardized coefficients, you can only make comparisons among the magnitude of the relationships between predictors and the response if your predictors are on the same scale (for example, spectral data). Otherwise, use the standardized coefficient plot or use the loading plot to compare the weights of predictors used to calculate the components.
The coefficient plot is a projected scatterplot showing the standardized coefficients for each predictor. Minitab provides one standardized coefficient plot per response.
Use this plot, along with the output of regression coefficients to compare the sign and magnitude of the coefficients for each predictor. The plot makes it easier to quickly identify predictors that are more or less important in the model.
Because the plot displays standardized coefficients, you can make comparisons among the magnitude of the relationships between predictors and the response even if your predictors are not on the same scale.
If your predictors are on the same scale, the pattern of coefficients in standardized and unstandardized plots look similar. These plots may not look identical, though, because the predictors are highly correlated, causing the coefficients to be unstable and because of differences between sample standard deviations and population standard deviations.
The distance plot is a scatterplot of each observation's distance from the x- and y-model. Distances from the y-model measure how well an observation is fitted in the y-space. Distances from the x-model measure how well an observation is fitted in the x-space.
When examining this plot, look for points with distances greater than other points on the x- or y-axis. Observations with greater distances from the y-model may be outliers and observations with greater distances from the x-model may be leverage points.
The histogram of the standardized residuals shows the distribution of the standardized residuals for all observations.
Pattern | What the pattern may indicate |
---|---|
A long tail in one direction | Skewness |
A bar that is far away from the other bars | An outlier |
Because the appearance of a histogram depends on the number of intervals used to group the data, don't use a histogram to assess the normality of the residuals. Instead, use a normal probability plot. A histogram is most effective when you have approximately 20 or more data points. If the sample is too small, then each bar on the histogram does not contain enough data points to reliably show skewness or outliers.
The normal probability plot of the residuals displays the standardized residuals versus their expected values when the distribution is normal.
Use the normal probability plot of the residuals to verify the assumption that the residuals are normally distributed. The normal probability plot of the residuals should approximately follow a straight line.
If you see a nonnormal pattern, use the other residual plots to check for other problems with the model, such as missing terms or a time order effect. If the residuals do not follow a normal distribution, the confidence intervals and p-values can be inaccurate.
The residuals versus fits graph plots the standardized residuals on the y-axis and the fitted values on the x-axis.
Use the residuals versus fits plot to verify the assumption that the residuals are randomly distributed and have constant variance. Ideally, the points should fall randomly on both sides of 0, with no recognizable patterns in the points.
Pattern | What the pattern may indicate |
---|---|
Fanning or uneven spreading of residuals across fitted values | Nonconstant variance |
Curvilinear | A missing higher-order term |
A point that is far away from zero | An outlier |
A point that is far away from the other points in the x-direction | An influential point |
The residual versus leverage plot is a scatterplot of the standardized residuals versus the leverage of each observations.
The residuals versus order plot displays the standardized residuals in the order that the data were collected.
The score plot is a scatterplot of the x-scores from the first and second components in the model.
If the first two components explain most of the variance in the predictors, then the configuration of the points on this plot closely reflects the original multidimensional configuration of your data. To check how much variance in the predictors the model explains, examine the x-variance values in the Model Selection and Validation table. If the x-variance value is high, then the model explains significance variance in the predictors.
If your model contains more than 2 components, you may want to plot the x-scores of other components using a Scatterplot. To do this, store the x-score matrix and then copy the matrix into columns using . If your model has only one component, this plot does not appear in your output.
The 3D score plot is a three-dimensional scatterplot of the x-scores from the first, second, and third components in the model. If the first three components explain most of the variance in the predictors, then the configuration of the points on this plot closely reflects the original multidimensional configuration of your data. To check how much variance the model explains, examine the x-variance values in the Model Selection and Validation table. If the x-variance value is high, then the model explains significance variance in the predictors.
You should also use the 3D graph tools, which allow you to rotate the plot so you can view it from different perspectives. This will give you a more complete picture of your data and allow you to more accurately identify leverage points and clusters of points.
The loading plot is a scatterplot of the predictors projected onto the first and second components in the model. It shows the x-loadings for the second component plotted against the x-loadings of the first component. Each point, representing a predictor, is connected to (0,0) on the plot.
The loading plot shows how important the predictors are to the first two components and is particularly useful when your predictors are on different scales. If the components explain most of the x-variance, which is shown in the Model Selection and Validation table, then the loading plot indicates how important the predictors are in the x-space. When considering the importance of the predictors in the entire model, you must also consider how much variance the components explain in the responses. To check this, examine the R2 and predicted R2 values in the Model Selection and Validation table.
If your model contains more than 2 components, you may want to plot the x-loadings of other components using a Scatterplot. To do this, store the x-loading matrix and then copy the matrix into columns using .
The residual X plot is a line plot of the x-residuals versus the predictors. Each line represents an observation and has as many points as it has predictors.
Use the x-residual matrix plot to identify observations or predictors that the model describes poorly. This plot is most useful with predictors that are on the same scale.
Use the x-residual matrix plot to examine general patterns in the residuals and identify areas where problems exist. Then, examine the x-residuals displayed in the output to determine which observations and predictors the model describes poorly.
The calculated X plot is a line plot of the x-calculated values versus the predictors. Each line represents an observation and has as many points as it has predictors.
Use this plot to identify observations or predictors that the model describes poorly. This plot is most useful with predictors that are on the same scale.
The calculated X plot complements the x-residual plot. The sum of both plots results in a plot of the original predictor values. A predictor with x-calculated values that are much smaller or larger than the original x-values is not well described by the model.