Exercise caution when using variable selection procedures such as best subsets and stepwise regression. These procedures are automatic and, therefore, do not consider the practical importance of any of the predictors. Also, when you fit a model to data, the goodness of the fit comes from two basic sources:
- The underlying structure of the data (a structure that will apply to other data sets collected in the same way)
- The peculiarities of the one specific data set you analyze
To ensure that your model doesn't just fit one specific data set, you should verify the model found by the selection procedure on a new set of data. You can also take the original data set, randomly divide it into two parts, use best subsets on one part to select a model, and then verify the fit on the second part. This will help ensure that the model you select will apply to other data sets collected in the same way.