Any classification tree is a collection of splits. Each split provides improvement to the tree. Each split also includes surrogate splits that also provide improvement to the tree. The importance of a variable is given by all of its improvements when the tree uses the variable to split a node or as a surrogate to split a node when another variable has a missing value.
The following formula gives the improvement at a single node:
The values of I(t), p_{Left}, and p_{Right} depend on the criterion for splitting the nodes. For more information, go to Node splitting methods in CART® Classification.
where
Term | Description |
---|---|
N | sample size of the full data or the training data |
w_{i} | weight for the i^{th} observation in the full or training data set |
y_{i} | indicator variable that is 1 for the event and 0 otherwise for the full or training data set |
predicted probability of the event for the i^{th} row in the full or training data set |
where
Term | Description |
---|---|
N | sample size of the full or training data |
n_{j} | sample size of fold j |
w_{ij} | weight for the i^{th} observation in fold j |
y_{ij} | indicator variable that is 1 for the event and 0 otherwise for the data in fold j |
predicted probability of the event from the model estimation that does not include the observations for the i^{th} observation in fold j |
where
Term | Description |
---|---|
n_{Test} | sample size of the test set |
w_{i}_{, Test} | weight for the i^{th} observation in the test data set |
y_{i}_{, Test} | indicator variable that is 1 for the event and 0 otherwise for the data in the test set |
predicted probability of the event for the i^{th} row in the test set |
For the area under the curve, Minitab uses an integration.
where k is the number of terminal nodes and (x_{0}, y_{0}) is the point (0, 0).
x (false positive rate) | y (true positive rate) |
---|---|
0.0923 | 0.3051 |
0.4154 | 0.7288 |
0.7538 | 0.9322 |
1 | 1 |
Term | Description |
---|---|
TRP | true positive rate |
FPR | false positive rate |
TP | true positive, events that were correctly assessed |
P | number of actual positive events |
FP | true negative, nonevents that were correctly assessed |
N | number of actual negative events |
FNR | false negative rate |
TNR | true negative rate |
The following interval gives the upper and lower bounds for the confidence interval:
The computation of the standard error of the area under the ROC curve () comes from Salford Predictive Modeler^{®}. For general information about estimation of the variance of the area under the ROC curve, see the following references:
Engelmann, B. (2011). Measures of a ratings discriminative power: Applications and limitations. In B. Engelmann & R. Rauhmeier (Eds.), The Basel II Risk Parameters: Estimation, Validation, Stress Testing - With Applications to Loan Risk Management (2nd ed.) Heidelberg; New York: Springer. doi:10.1007/978-3-642-16114-8
Cortes, C. and Mohri, M. (2005). Confidence intervals for the area under the ROC curve. Advances in neural information processing systems, 305-312.
Feng, D., Cortese, G., & Baumgartner, R. (2017). A comparison of confidence/credible interval methods for the area under the ROC curve for continuous diagnostic tests with small sample size. Statistical Methods in Medical Research, 26(6), 2603-2621. doi:10.1177/0962280215602040
Term | Description |
---|---|
A | area under the ROC curve |
0.975 percentile of the standard normal distribution |
For the 10% of observations in the data with the highest probabilities of being assigned to the event class, use the following formula.
For the test lift with a test data set, use observations from the test data set. For the test lift with k-fold cross-validation, select the data to use and calculate the lift from the predicted probabilities for data that are not in the model estimation.
Term | Description |
---|---|
d | number of cases in 10% of the data |
predicted probability of the event | |
probability of the event in the training data or, if the analysis uses no validation, in the full data set |
The misclassification cost in the model summary table is the relative misclassification cost for the model relative to a trivial classifier that classifies all observations into the most frequent class.
The relative misclassification cost has the following form:
Where R_{0} is the cost for the trivial classifier.
The formula for R simplifies when the prior probabilities are equal or are from the data.
With this definition, R has the following form:
Term | Description |
---|---|
π_{j} | prior probability of the j^{th} class of the response variable |
cost of misclassifying class i as class j | |
number of class i records misclassified as class j | |
N_{j} | number of cases in the j^{th} class of the response variable |
K | number of classes in the response variable |
N | number of cases in the data |