Use the following process to find the x- and y-coordinates for the chart.
For example, suppose the following table summarizes a tree with 4 terminal nodes:
A: Terminal node | B: Number of events | C: Number of nonevents | D: Number of cases | E: Threshold (B/D) |
---|---|---|---|---|
4 | 18 | 12 | 30 | 0.60 |
1 | 25 | 42 | 67 | 0.37 |
3 | 12 | 44 | 56 | 0.21 |
2 | 4 | 32 | 36 | 0.11 |
Totals | 59 | 130 | 189 |
Then the following are the corresponding 4 tables with their respective false positive rates and true positive rates to 2 decimal places:
Predicted | |||
---|---|---|---|
event | nonevent | ||
Observed | event | 18 | 41 |
nonevent | 12 | 118 |
Predicted | |||
---|---|---|---|
event | nonevent | ||
Observed | event | 43 | 16 |
nonevent | 54 | 76 |
Predicted | |||
---|---|---|---|
event | nonevent | ||
Observed | event | 55 | 4 |
nonevent | 98 | 32 |
Predicted | |||
---|---|---|---|
event | nonevent | ||
Observed | event | 59 | 0 |
nonevent | 130 | 0 |
Use the same steps as the training data set procedure, but calculate the event probability from the cases for the test data set.
The procedure to define the x- and y-coordinates on the ROC curve chart with k-fold cross-validation has an additional step. This step creates many distinct event probabilities. For example, suppose the tree diagram contains 4 terminal nodes. We have 10-fold cross-validation. Then, for the ith fold, you use 9/10 portion of the data to estimate the event probabilities for cases in fold i. When this process repeats for each fold, the maximum number of distinct event probabilities is 4 *10 = 40. After that, sort all the distinct event probabilities in decreasing order. Use the event probabilities as each of the threshold values to assign predicted classes for cases in the entire data set. After this step, steps from 3 to the end for the training data set procedure apply to find the x- and y-coordinates.