Find definitions and interpretation guidance for the Method
table.

A prior probability is the probability that an observation will fall into a group before you collect the data. For example, if you are classifying the buyers of a specific car, you might already know that 60% of purchasers are male and 40% are female.

Use prior probabilities to increase classification accuracy for certain classes. CART makes different internal balancing decisions based on the prior probabilities. Increasing the probability of a class and decreasing the probability of another class helps balance the misclassification rates for different classes. For instance, increasing the event probability and decreasing the nonevent probability may improve the false positive rate but may worsen the false negative rate.

Increasing the event probability will lower the node threshold for the class assignment to the event. Thus, nodes with lower fractions of the event class are classified as the event. Prior probabilities have the strongest impact on the development of the entire tree during the tree growing stage and provide powerful means to change the final model.

- Same for all classes
- For example, with 4 classes, each class probability is 0.25.
- Match total sample frequencies
- For example, the first class may contain 50% of the frequencies, the second class may contain 30% of the frequencies, and the last class may contain 20% of the frequencies. Thus, the prior probabilities are 0.50, 0.30, and 0.20.
- User specified
- The prior probabilities are based on your judgment and may be altered to balance misclassification rates. The probabilities must sum to 1.

Minitab provides the following node splitting methods:

- Gini
- Entropy
- Class probability
- Twoing — available with a multinomial response. The Twoing method is the same as the Gini method with a binary response.

Use the splitting method to find the tree that best fits your data. Certain splitting methods may be better than others depending on your particular data. Compare the results from several splitting methods to determine the best choice for your application.

Minitab either uses the minimum misclassification cost to determine the
optimal tree, or you can specify a range of standard error around the
misclassification cost to expand the selection of the optimal tree.

- Minimum misclassification cost
- Minitab uses the minimum relative cost to select the optimal tree.
- Within X standard error of minimum misclassification cost
- Minitab identifies the trees with misclassification costs that fall within the standard error range that you specify and selects the tree with the smallest number of terminal nodes within that range as the optimal tree.

Minitab uses the cross-validation method or uses a separate test set to validate the model. With cross-validation, you can specify the rows for each fold, or allow a random selection. With a separate test set, you can specify the rows for both training and test sets or allow a random selection.

The missing value penalty penalizes a competitor based on the proportion of missing values for each node. Thus, a competitor with many missing values in a node is less likely to have a role of primary splitter.

The high level category penalty penalizes a competitor based on the number of categorical levels relative to the size of the node for each node. Thus, a competitor with many levels in a node is less likely to have a role of primary splitter.

Indicates the column that is used to weight the response.

The number of response observations used in the tree.

The number of missing response observations. This also includes missing values, 0, or negative values in the weight column.