Data considerations for Fit Model and Discover Key Predictors with TreeNet® Regression

Note

This command is available with the Predictive Analytics Module. Click here for more information about how to activate the module.

To ensure that your results are valid, consider the following guidelines when you collect data, perform the analysis, and interpret your results.

The response variable should be continuous
A continuous variable can be measured and ordered, and has an infinite number of values between any two values. For example, the diameters of a sample of tires is a continuous variable.

The data for the response variable must be numeric values.

If your response variable is categorical, use Fit Model or Discover Key Predictors for TreeNet® Classification.

Predictor variables may be continuous or categorical
You can use a combination of continuous or categorical predictors; however, the column lengths for each predictor must be the same length as the response column. Missing values are allowed.
  • All continuous predictors must be numeric.
  • Categorical predictors can be text or numeric values.
A test set is recommended when the number of cases > 2000

By default, Minitab uses cross-validation when the number of cases is ≤ 2000. When the number of cases is larger than 2000, Minitab uses a test set. Usually, cross-validation is a better validation method, but requires more time to calculate the results. Validation with a test set is useful when the cross-validation method is too time-consuming.

To learn more about the settings for validation techniques in Fit Model and Discover Key Predictors, go to Specify the validation method for Fit Model and Discover Key Predictors with TreeNet® Regression.