Methods and formulas for the model summary in CART^® Classification

Select the method or formula of your choice.

In This Topic

Important predictors
Average –loglikelihood
Area under ROC curve

95% CI for the area under the ROC curve
Lift
Misclassification cost

Important predictors

The number of predictors with positive relative importance.

Any classification tree is a collection of splits. Each split provides improvement to the tree. Each split also includes surrogate splits that also provide improvement to the tree. The importance of a variable is given by all of its improvements when the tree uses the variable to split a node or as a surrogate to split a node when another variable has a missing value.

The following formula gives the improvement at a single node:

The values of I(t), p_Left, and p_Right depend on the criterion for splitting the nodes. For more information, go to Node splitting methods in CART® Classification.

The formula for the relative importance for the q^th predictor scales the importance by the most important variable:

Average –loglikelihood

Minitab calculates the average of the negative log-likelihood function when the response is binary. The calculations depend on the validation method.

Training data or no validation

where

Notation for training data or no validation

Term	Description
N	sample size of the full data or the training data
w_i	weight for the i^th observation in the full or training data set
y_i	indicator variable that is 1 for the event and 0 otherwise for the full or training data set
	predicted probability of the event for the i^th row in the full or training data set

K-fold cross-validation

where

Notation for k-fold cross-validation

Term	Description
N	sample size of the full or training data
n_j	sample size of fold j
w_ij	weight for the i^th observation in fold j
y_ij	indicator variable that is 1 for the event and 0 otherwise for the data in fold j
	predicted probability of the event from the model estimation that does not include the observations for the i^th observation in fold j

Test data set

where

Notation for test data set

Term	Description
n_Test	sample size of the test set
w_i_{, Test}	weight for the i^th observation in the test data set
y_i_{, Test}	indicator variable that is 1 for the event and 0 otherwise for the data in the test set
	predicted probability of the event for the i^th row in the test set

Area under ROC curve

The ROC curve plots the true positive rate (TPR), also known as power, on the y-axis, and the false positive rate (FPR), also known as type 1 error, on the x-axis. The area under the ROC curve values typically range from 0.5 to 1.

Formula

For the area under the curve, Minitab uses an integration.

In most cases, this integral is equivalent to the following summation of areas of trapezoids:

where k is the number of terminal nodes and (x₀, y₀) is the point (0, 0).

For example, suppose your results have 4 terminal nodes with the following coordinates on the ROC curve:

x (false positive rate)	y (true positive rate)
0.0923	0.3051
0.4154	0.7288
0.7538	0.9322
1	1

Then the area under the ROC curve is given by the following calculation:

Notation

Term	Description
TRP	true positive rate
FPR	false positive rate
TP	true positive, events that were correctly assessed
P	number of actual positive events
FP	true negative, nonevents that were correctly assessed
N	number of actual negative events
FNR	false negative rate
TNR	true negative rate

95% CI for the area under the ROC curve

Minitab calculates a confidence interval for the area under the Receiver Operating Characteristic curve when the response is binary.

The following interval gives the upper and lower bounds for the confidence interval:

The computation of the standard error of the area under the ROC curve () comes from Salford Predictive Modeler^®. For general information about estimation of the variance of the area under the ROC curve, see the following references:

Engelmann, B. (2011). Measures of a ratings discriminative power: Applications and limitations. In B. Engelmann & R. Rauhmeier (Eds.), The Basel II Risk Parameters: Estimation, Validation, Stress Testing - With Applications to Loan Risk Management (2nd ed.) Heidelberg; New York: Springer. doi:10.1007/978-3-642-16114-8

Cortes, C. and Mohri, M. (2005). Confidence intervals for the area under the ROC curve. Advances in neural information processing systems, 305-312.

Feng, D., Cortese, G., & Baumgartner, R. (2017). A comparison of confidence/credible interval methods for the area under the ROC curve for continuous diagnostic tests with small sample size. Statistical Methods in Medical Research, 26(6), 2603-2621. doi:10.1177/0962280215602040

Notation

Term	Description
A	area under the ROC curve
	0.975 percentile of the standard normal distribution

Lift

Minitab displays lift in the model summary table when the response is binary. The lift in the model summary table is the cumulative lift for the 10% of the data with the best chance of correct classification.

Formula

For the 10% of observations in the data with the highest probabilities of being assigned to the event class, use the following formula.

For the test lift with a test data set, use observations from the test data set. For the test lift with k-fold cross-validation, select the data to use and calculate the lift from the predicted probabilities for data that are not in the model estimation.

Notation

Term	Description
d	number of cases in 10% of the data
	predicted probability of the event
	probability of the event in the training data or, if the analysis uses no validation, in the full data set

Misclassification cost

The misclassification cost in the model summary table is the relative misclassification cost for the model relative to a trivial classifier that classifies all observations into the most frequent class.

To find the misclassification cost, begin with the following definition:

The relative misclassification cost has the following form:

Where R₀ is the cost for the trivial classifier.

The formula for R simplifies when the prior probabilities are equal or are from the data.

Equal prior probabilities

When the prior probabilities are equal, the following definition applies:

With this definition, R has the following form:

Prior probabilities from the data

When the prior probabilities are from the data, the following definition applies:

With this definition, R has the following form:

Notation

Term	Description
π_j	prior probability of the j^th class of the response variable
	cost of misclassifying class i as class j
	number of class i records misclassified as class j
N_j	number of cases in the j^th class of the response variable
K	number of classes in the response variable
N	number of cases in the data

Methods and formulas for the model summary in CART® Classification

In This Topic

Important predictors

Average –loglikelihood

Training data or no validation

Notation for training data or no validation

K-fold cross-validation

Notation for k-fold cross-validation

Test data set

Notation for test data set

Area under ROC curve

Formula

Notation

95% CI for the area under the ROC curve

Notation

Lift

Formula

Notation

Misclassification cost

Equal prior probabilities

Prior probabilities from the data

Notation

Methods and formulas for the model summary in CART^® Classification