Receiver Operating Characteristic (ROC) curve chart for CART^® Classification

The procedure for the points on the ROC curve depends on the validation method. For a multinomial response variable, Minitab displays multiple charts that treat each class as the event in turn.

Training data set or no validation

For the chart for a training data set, each point on the chart represents a terminal node from the tree. The terminal node with the highest event probability is the first point on the chart and appears leftmost. The other terminal nodes are in order of decreasing event probability.

Use the following process to find the x- and y-coordinates for the chart.

Calculate the event probability of each terminal node:
where
- n_1,k is the number of events in the k^th node
- N_k is the number of cases in the k^th node
Rank the terminal nodes from highest to lowest event probability.

Use every event probability as a threshold. For a specific threshold, cases with estimated event probability greater than or equal to the threshold get 1 as the predicted class, 0 otherwise. Then, you can form a 2x2 table for all cases with observed classes as rows and predicted classes as columns to calculate the false positive rate and the true positive rate for each terminal node. The false positive rates are the x-coordinates for the chart The true positive rates are the y-coordinates.

For example, suppose the following table summarizes a tree with 4 terminal nodes:

A: Terminal node	B: Number of events	C: Number of nonevents	D: Number of cases	E: Threshold (B/D)
4	18	12	30	0.60
1	25	42	67	0.37
3	12	44	56	0.21
2	4	32	36	0.11
Totals	59	130	189

Then the following are the corresponding 4 tables with their respective false positive rates and true positive rates to 2 decimal places:

Table 1. Threshold = 0.60.
False positive rate = 12 / (12 + 118) = 0.09

True positive rate = 18 / (18 + 41) = 0.31
		Predicted
		event	nonevent
Observed	event	18	41
Observed	nonevent	12	118

Table 2. Threshold = 0.37.
False positive rate = (12 + 42) / 130 = 0.42

True positive rate = (18 + 25) / 59 = 0.73
		Predicted
		event	nonevent
Observed	event	43	16
Observed	nonevent	54	76

Table 3. Threshold = 0.21.
False positive rate = (12 + 42 + 44) / 130 = 0.75

True positive rate = (18 + 25 + 12) / 59 = 0.93
		Predicted
		event	nonevent
Observed	event	55	4
Observed	nonevent	98	32

Table 4. Threshold = 0.11.
False positive rate = (12 + 42 + 44 + 32) / 130 = 1

True positive rate = (18 + 25 + 12 + 4) / 59 = 1
		Predicted
		event	nonevent
Observed	event	59	0
Observed	nonevent	130	0

Separate test data set

Use the same steps as the training data set procedure, but calculate the event probability from the cases for the test data set.

Test with k-fold cross-validation

The procedure to define the x- and y-coordinates on the ROC curve chart with k-fold cross-validation has an additional step. This step creates many distinct event probabilities. For example, suppose the tree diagram contains 4 terminal nodes. We have 10-fold cross-validation. Then, for the i^th fold, you use 9/10 portion of the data to estimate the event probabilities for cases in fold i. When this process repeats for each fold, the maximum number of distinct event probabilities is 4 *10 = 40. After that, sort all the distinct event probabilities in decreasing order. Use the event probabilities as each of the threshold values to assign predicted classes for cases in the entire data set. After this step, steps from 3 to the end for the training data set procedure apply to find the x- and y-coordinates.

Receiver Operating Characteristic (ROC) curve chart for CART® Classification

Training data set or no validation

Separate test data set

Test with k-fold cross-validation

Receiver Operating Characteristic (ROC) curve chart for CART^® Classification