Data considerations for CART® Regression

To ensure that your results are valid, consider the following guidelines when you collect data, perform the analysis, and interpret your results.

The response variable should be continuous
A continuous variable can be measured and ordered, and has an infinite number of values between any two values. For example, the diameters of a sample of tires is a continuous variable.

The data for the response variable must be numeric values.

If your response variable is categorical, use CART® Classification.

Predictor variables may be continuous or categorical
You can use a combination of continuous or categorical predictors; however, the column lengths for each predictor must be the same length as the response column. Missing values are allowed.
  • All continuous predictors must be numeric.
  • Categorical predictors can be text or numeric values.
A test set is recommended when the number of cases > 5000

By default, Minitab uses cross-validation when the number of cases is ≤ 5000. When the number of cases is larger than 5000, Minitab uses a test set. Validation with a training set of data and a test set of data is useful when the data set is large. To learn more about the settings for validation techniques in CART® Regression, go to Specify the validation method for CART® Regression.