To ensure that your results are valid, consider the following guidelines when you collect data, perform the analysis, and interpret your results.
- The response variable (target) should be categorical
- Categorical variables contain a finite, countable number of categories or distinct groups. Categorical data may or may not have a logical order. For example, categorical variables include gender, material type, and payment method.
- If your response variable has two categories, such as pass and fail, then the response is binary.
- If your response variable contains three or more categories, then the response is multinomial.
The data for the response variable must be either text values or numeric values. Date/time values are not allowed.
If your response variable is continuous, use CART®
- Predictor variables may be continuous or categorical
You can use a combination of continuous or categorical predictors; however, the column lengths for each predictor must be the same length as the response column. Missing values are allowed.
- All continuous predictors must be numeric.
- Categorical predictors can be text or numeric values.
- A test set is recommended when the number of cases > 5000
By default, Minitab uses cross-validation when the number of cases is ≤ 5000. When the number of cases is larger than 5000, Minitab uses a test set. Validation with a training set of data and a test set of data is useful when the data set is large. To learn more about the settings for validation techniques in CART®
Classification, go to Specify the validation method for CART® Classification.