In best subsets regression, by default, Minitab Express selects the model with the highest R2 values that contain one predictor, two predictors, and so on. You can determine which predictors are included in each model based on which columns in the output table are marked with an "X".
Use the goodness-of-fit statistics to determine which model provides the best fit to your data. Before you select a final model, you should examine residual plots and other diagnostic measures to ensure that the model meets the assumptions of the analysis.
R2 is the percentage of variation in the response that is explained by the model. The higher the R2 value, the better the model fits your data. R2 is always between 0% and 100%.
R2 always increases when you add additional predictors to a model. For example, the best five-predictor model will always have an R2 that is at least as high the best four-predictor model. Therefore, R2 is most useful when you compare models of the same size.
Use adjusted R2 when you want to compare models that have different numbers of predictors. R2 always increases when you add a predictor to the model, even when there is no real improvement to the model. The adjusted R2 value incorporates the number of predictors in the model to help you choose the correct model.
Use predicted R2 to determine how well your model predicts the response for new observations. Models that have larger predicted R2 values have better predictive ability.
A predicted R2 that is substantially less than R2 may indicate that the model is over-fit. An over-fit model occurs when you add terms for effects that are not important in the population, although they may appear important in the sample data. The model becomes tailored to the sample data and therefore, may not be useful for making predictions about the population.
Predicted R2 can also be more useful than adjusted R2 for comparing models because it is calculated with observations that are not included in the model calculation.
Use S to assess how well the model describes the response. Use S instead of the R2 statistics to compare the fit of models that have no constant.
S is measured in the units of the response variable and represents the standard deviation of how far the data values fall from the fitted values. The lower the value of S, the better the model describes the response. However, a low S value by itself does not indicate that the model meets the model assumptions. You should check the residual plots to verify the assumptions.
Small samples do not provide a precise estimate of the strength of the relationship between the response and predictors. If you need R2 to be more precise, you should use a larger sample (typically, 40 or more).
R2 is just one measure of how well the model fits the data. Even when a model has a high R2, you should check the residual plots to verify that the model meets the model assumptions.
In these results, there are several models to examine further. The model with all 5 predictors has the lowest value of S and the highest value of adjusted R2, approximately 8 and 88 respectively. A model with 2 predictors has the highest predicted R2 value of 81.4%. Before you select the final model, you should examine the models for violations of the regression assumptions using residual plots and other diagnostic measures.