A prior probability is the probability that an observation will fall into a group before you collect the data. For example, if you are classifying the buyers of a specific car, you might already know that 60% of purchasers are male and 40% are female.
Use prior probabilities to increase classification accuracy for certain classes. CART makes different internal balancing decisions based on the prior probabilities. Increasing the probability of a class and decreasing the probability of another class helps balance the misclassification rates for different classes. For instance, increasing the event probability and decreasing the nonevent probability may improve the false positive rate but may worsen the false negative rate.
Increasing the event probability will lower the node threshold for the class assignment to the event. Thus, nodes with lower fractions of the event class are classified as the event. Prior probabilities have the strongest impact on the development of the entire tree during the tree growing stage and provide powerful means to change the final model.
Use the splitting method to find the tree that best fits your data. Certain splitting methods may be better than others depending on your particular data. Compare the results from several splitting methods to determine the best choice for your application.
Minitab uses the cross-validation method or uses a separate test set to validate the model. With cross-validation, you can specify the rows for each fold, or allow a random selection. With a separate test set, you can specify the rows for both training and test sets or allow a random selection.
The missing value penalty penalizes a competitor based on the proportion of missing values for each node. Thus, a competitor with many missing values in a node is less likely to have a role of primary splitter.
The high level category penalty penalizes a competitor based on the number of categorical levels relative to the size of the node for each node. Thus, a competitor with many levels in a node is less likely to have a role of primary splitter.
Indicates the column that is used to weight the response.
The number of response observations used in the tree.
The number of missing response observations. This also includes missing values or zeros in the weight column.