Interpret all statistics for Best Subsets Regression

Find definitions and interpretation guidance for every statistic that is provided with best subsets regression.

Number of predictors

The number of predictors indicates the how many predictors are contained in each model. For each number of predictors, by default, Minitab selects the model with the highest R2 values. The right side of the table indicates which predictors are in the models indicated by "X" symbols.
 Model Summary
 Number of Predictors R-sq R-sq(adj) R-sq(pred) Mallows' Cp S Insolation East South North Time of Day 1 72.1 71.0 66.9 38.5 12.33 X 2 85.9 84.8 81.4 9.1 8.932 X X 3 87.4 85.9 79.0 7.6 8.598 X X X 4 89.1 87.3 80.6 5.8 8.170 X X X X 5 89.9 87.7 78.8 6.0 8.039 X X X X X

R-sq

R2 is the percentage of variation in the response that is explained by the model. It is calculated as 1 minus the ratio of the error sum of squares (which is the variation that is not explained by model) to the total sum of squares (which is the total variation in the model).

Interpretation

Use R2 to determine how well the model fits your data. The higher the R2 value, the better the model fits your data. R2 is always between 0% and 100%.

You can use a fitted line plot to graphically illustrate different R2 values. The first plot illustrates a simple regression model that explains 85.5% of the variation in the response. The second plot illustrates a model that explains 22.6% of the variation in the response. The more variation that is explained by the model, the closer the data points fall to the fitted regression line. Theoretically, if a model could explain 100% of the variation, the fitted values would always equal the observed values and all of the data points would fall on the fitted line.
Consider the following issues when interpreting the R2 value:
• R2 always increases when you add additional predictors to a model. For example, the best five-predictor model will always have an R2 that is at least as high the best four-predictor model. Therefore, R2 is most useful when you compare models of the same size.

• Small samples do not provide a precise estimate of the strength of the relationship between the response and predictors. If you need R2 to be more precise, you should use a larger sample (typically, 40 or more).

• R2 is just one measure of how well the model fits the data. Even when a model has a high R2, you should check the residual plots to verify that the model meets the model assumptions.

Adjusted R2 is the percentage of the variation in the response that is explained by the model, adjusted for the number of predictors in the model relative to the number of observations. Adjusted R2 is calculated as 1 minus the ratio of the mean square error (MSE) to the mean square total (MS Total).

Interpretation

Use adjusted R2 when you want to compare models that have different numbers of predictors. R2 always increases when you add a predictor to the model, even when there is no real improvement to the model. The adjusted R2 value incorporates the number of predictors in the model to help you choose the correct model.

For example, you work for a potato chip company that examines the factors that affect the percentage of crumbled potato chips per container. You receive the following results as you add the predictors in a forward stepwise approach:
Step % Potato Cooling rate Cooking temp R2 Adjusted R2 P-value
1 X     52% 51% 0.000
2 X X   63% 62% 0.000
3 X X X 65% 62% 0.000

The first step yields a statistically significant regression model. The second step adds cooling rate to the model. Adjusted R2 increases, which indicates that cooling rate improves the model. The third step, which adds cooking temperature to the model, increases the R2 but not the adjusted R2. These results indicate that cooking temperature does not improve the model. Based on these results, you consider removing cooking temperature from the model.

R-sq (pred)

Predicted R2 is calculated with a formula that is equivalent to systematically removing each observation from the data set, estimating the regression equation, and determining how well the model predicts the removed observation. The value of predicted R2 ranges between 0% and 100%. (While the calculations for predicted R2 can produce negative values, Minitab displays zero for these cases.)

Interpretation

Use predicted R2 to determine how well your model predicts the response for new observations. Models that have larger predicted R2 values have better predictive ability.

A predicted R2 that is substantially less than R2 may indicate that the model is over-fit. An over-fit model occurs when you add terms for effects that are not important in the population, although they may appear important in the sample data. The model becomes tailored to the sample data and therefore, may not be useful for making predictions about the population.

Predicted R2 can also be more useful than adjusted R2 for comparing models because it is calculated with observations that are not included in the model calculation.

For example, an analyst at a financial consulting company develops a model to predict future market conditions. The model looks promising because it has an R2 of 87%. However, the predicted R2 is only to 52%, which indicates that the model may be over-fit.

S

S represents how far the data values fall from the fitted values. S is measured in the units of the response.

Interpretation

Use S to assess how well the model describes the response. S is measured in the units of the response variable and represents the how far the data values fall from the fitted values. The lower the value of S, the better the model describes the response. However, a low S value by itself does not indicate that the model meets the model assumptions. You should check the residual plots to verify the assumptions.

For example, you work for a potato chip company that examines the factors that affect the percentage of crumbled potato chips per container. You reduce the model to the significant predictors, and S is calculated as 1.79. This result indicates that the standard deviation of the data points around the fitted values is 1.79. If you are comparing models, values that are lower than 1.79 indicate a better fit, and higher values indicate a worse fit.

Mallows' Cp

Mallows' Cp can help you choose between competing multiple regression models. Mallows' Cp compares the full model to models with the best subsets of predictors. It helps you strike an important balance with the number of predictors in the model. A model with too many predictors can be relatively imprecise while a model with too few predictors can produce biased estimates. Using Mallows' Cp to compare regression models is only valid when you start with the same complete set of predictors.

Interpretation

A Mallows' Cp value that is close to the number of predictors plus the constant indicates that the model produces relatively precise and unbiased estimates.

A Mallows' Cp value that is greater than the number of predictors plus the constant indicates that the model is biased and does not fit the data well.

By using this site you agree to the use of cookies for analytics and personalized content.  Read our policy