Data considerations for CART® Classification

To ensure that your results are valid, consider the following guidelines when you collect data, perform the analysis, and interpret your results.

The response variable should be categorical
Categorical variables contain a finite, countable number of categories or distinct groups. Categorical data may or may not have a logical order. For example, categorical variables include gender, material type, and payment method.
  • If your response variable has two categories, such as pass and fail, then the response is binary.
  • If your response variable contains three or more categories, then the response is multinomial.

The data for the response variable must be either text values or numeric values. Date/time values are not allowed.

If your response variable is continuous, use CART® Regression.

Predictor variables may be continuous or categorical
You can use a combination of continuous or categorical predictors; however, the column lengths for each predictor must be the same length as the response column. Missing values are allowed.
  • All continuous predictors must be numeric.
  • Categorical predictors can be text or numeric values.
A test set is recommended when the number of cases > 5000

By default, Minitab uses cross-validation when the number of cases is ≤ 5000. When the number of cases is larger than 5000, Minitab uses a test set. Validation with a training set of data and a test set of data is useful when the data set is large. To learn more about the settings for validation techniques in CART® Classification, go to Specify the validation method for CART® Classification.