In best subsets regression, Minitab selects the two models with the highest R^{2} values that contain one predictor, two predictors, and so on. You can determine which predictors are included in each model based on which columns in the output table are marked with an "X".
Use the goodness-of-fit statistics to determine which model provides the best fit to your data. Before you select a final model, you should examine residual plots and other diagnostic measures to ensure that the model meets the assumptions of the analysis.
The higher the R^{2} value, the better the model fits your data. R^{2} is always between 0% and 100%.
R^{2} always increases when you add additional predictors to a model. For example, the best five-predictor model will always have an R^{2} that is at least as high as the best four-predictor model. Therefore, R^{2} is most useful when you compare models of the same size.
Use adjusted R^{2} when you want to compare models that have different numbers of predictors. R^{2} always increases when you add a predictor to the model, even when there is no real improvement to the model. The adjusted R^{2} value incorporates the number of predictors in the model to help you choose the correct model.
Use predicted R^{2} to determine how well your model predicts the response for new observations. Models that have larger predicted R^{2} values have better predictive ability.
A predicted R^{2} that is substantially less than R^{2} may indicate that the model is over-fit. An over-fit model occurs when you add terms for effects that are not important in the population. The model becomes tailored to the sample data and, therefore, may not be useful for making predictions about the population.
Predicted R^{2} can also be more useful than adjusted R^{2} for comparing models because it is calculated with observations that are not included in the model calculation.
Use S to assess how well the model describes the response. Use S instead of the R^{2} statistics to compare the fit of models that have no constant.
S is measured in the units of the response variable and represents how far the data values fall from the fitted values. The lower the value of S, the better the model describes the response. However, a low S value by itself does not indicate that the model meets the model assumptions. You should check the residual plots to verify the assumptions.
Small samples do not provide a precise estimate of the strength of the relationship between the response and predictors. For example, if you need R^{2} to be more precise, you should use a larger sample (typically, 40 or more).
Goodness-of-fit statistics are just one measure of how well the model fits the data. Even when a model has a desirable value, you should check the residual plots to verify that the model meets the model assumptions.
Vars | R-Sq | R-Sq (adj) | R-Sq (pred) | Mallows Cp | S | I n s o l a t i o n | E a s t | S o u t h | N o r t h | T i m e o f D a y |
---|---|---|---|---|---|---|---|---|---|---|
1 | 72.1 | 71.0 | 66.9 | 38.5 | 12.328 | X | ||||
1 | 39.4 | 37.1 | 26.3 | 112.7 | 18.154 | X | ||||
2 | 85.9 | 84.8 | 81.4 | 9.1 | 8.9321 | X | X | |||
2 | 82.0 | 80.6 | 74.2 | 17.8 | 10.076 | X | X | |||
3 | 87.4 | 85.9 | 79.0 | 7.6 | 8.5978 | X | X | X | ||
3 | 86.5 | 84.9 | 81.4 | 9.7 | 8.9110 | X | X | X | ||
4 | 89.1 | 87.3 | 80.6 | 5.8 | 8.1698 | X | X | X | X | |
4 | 88.0 | 86.0 | 79.3 | 8.2 | 8.5550 | X | X | X | X | |
5 | 89.9 | 87.7 | 78.8 | 6.0 | 8.0390 | X | X | X | X | X |
In these results, there are several models to examine further. The model with all 5 predictors has the lowest value of S and the highest value of adjusted R^{2}, approximately 8 and 88 respectively. A model with 2 predictors and a model with 3 predictors both have the highest predicted R^{2} values of 81.4%. Before you select the final model, you should examine the models for violations of the regression assumptions using residual plots and other diagnostic measures.