Select the analysis options for CART® Classification

Stat > Predictive Analytics > CART® Classification > Options

Select the analysis options.

Node splitting method
Choose the splitting method to generate your decision tree. You can compare the results from several splitting methods to determine the best choice for your application.
  • Gini: The Gini method is the default method. The Gini method works well across many applications. The Gini method usually generates trees that include small nodes with a high concentration of the response of interest.
  • Entropy: The Entropy method is proportional to the maximum of certain likelihood functions for the node.
  • Twoing: The Twoing method is only available with a multinomial response. The Twoing method usually generates more balanced splits than the Gini or Entropy methods. For a binary response, the Twoing method is the same as the Gini method.
  • Class probability: The probability tree tends to be larger than the Gini tree. Use the probability method when you are interested in the performance of a few top nodes.
Criterion for selecting optimal tree
Choose between the following criteria to select the tree in the results. You can compare the results from different trees to determine the best choice for your application.
  • Minimum misclassification cost: Select this option to display results for the tree that minimizes the misclassification cost.
  • Within K standard errors of minimum misclassification cost; K=: Select this option to display results for the smallest tree with a misclassification cost within K standard errors of the minimum misclassification cost. By default, K=1, so the results are for the smallest tree with a misclassification cost within 1 standard error of the tree with the minimum misclassification cost.
Number of surrogates for a predictor with missing values
Enter the number of surrogates that Minitab searches for when a predictor has missing values. When many predictors have similar missing value patterns, you should increase the number of surrogates.
This number represents the maximum number of surrogates that Minitab searches for; however, this number of surrogates may not actually be found.
The default value is 10.
Minimum number of cases to split an internal node
Enter the minimum number of cases a node can have and still be split into more nodes. The default is 10. With larger sample sizes, you may want to increase this minimum. For example, if an internal node has 10 or more cases, Minitab tries to perform a split. If the internal node has 9 cases or less, Minitab does not try to perform a split.
The internal node limit must be at least twice the terminal node limit, but larger ratios are better. Internal node limits of at least 3 times terminal node limits allow a reasonable number of splitters.
Minimum number of cases allowed for a terminal node
Enter the minimum number of cases that can be in a terminal node. The default is 3. With larger sample sizes, you may want to increase this minimum. For example, if a split would create a node with less than 3 cases, Minitab does not perform a split.
Maximum tree depth
Enter a value to represent the maximum depth of a tree. The root node corresponds to a depth of 1. If you want to be sure to get the best tree, you need to allow for a deeper tree, even though it may slow the processing.
Weights
Enter a column that contains the case weights. The column must have the same number of rows as the response column. Values must be ≥ 0. Minitab omits rows that contain missing values or zeros from the analysis.