Training data or no validation
For the chart for a training data set, each
point on the chart represents a distinct fitted event probability. The highest
event probability is the first point on the chart and appears leftmost. The
other terminal nodes are in order of decreasing event probability.
Use the following process to find the x- and y-coordinates for the chart.
- Use every event probability
as a threshold. For a specific threshold, cases with estimated event
probability greater than or equal to the threshold get 1 as the predicted
class, 0 otherwise. Then, you can form a 2x2 table for all cases with observed
classes as rows and predicted classes as columns to calculate the false
positive rate and the true positive rate for each event probability. The false
positive rates are the x-coordinates for the chart. The true positive rates are
the y-coordinates.
For example, suppose the following table summarizes a model with two,
2-level categorical predictors. These predictors give four distinct event
probabilities, which are rounded to 2 decimal places:
A: Order
|
B: Predictor 1
|
C: Predictor 2
|
D: Number of events
|
E: Number of nonevents
|
F: Number of trials
|
G: Threshold (D/F)
|
1
|
1
|
1
|
18
|
12
|
30
|
0.60
|
2
|
1
|
2
|
25
|
42
|
67
|
0.37
|
3
|
2
|
1
|
12
|
44
|
56
|
0.21
|
4
|
2
|
2
|
4
|
32
|
36
|
0.11
|
Totals
|
|
|
59
|
130
|
189
|
|
The following are the corresponding four tables with their respective
false positive rates and true positive rates rounded to 2 decimal places:
Table 1. Threshold = 0.60.
False positive rate = 12 / (12 + 118) = 0.09
True positive rate = 18 / (18 + 41) = 0.31
|
|
Predicted
|
|
|
event
|
nonevent
|
Observed
|
event
|
18
|
41
|
nonevent
|
12
|
118
|
Table 2. Threshold = 0.37.
False positive rate = (12 + 42) / 130 = 0.42
True positive rate = (18 + 25) / 59 = 0.73
|
|
Predicted
|
|
|
event
|
nonevent
|
Observed
|
event
|
43
|
16
|
nonevent
|
54
|
76
|
Table 3. Threshold = 0.21.
False positive rate = (12 + 42 + 44) / 130 = 0.75
True positive rate = (18 + 25 + 12) / 59 = 0.93
|
|
Predicted
|
|
|
event
|
nonevent
|
Observed
|
event
|
55
|
4
|
nonevent
|
98
|
32
|
Table 4. Threshold = 0.11.
False positive rate = (12 + 42 + 44 + 32) / 130 = 1
True positive rate = (18 + 25 + 12 + 4) / 59 = 1
|
|
Predicted
|
|
|
event
|
nonevent
|
Observed
|
event
|
59
|
0
|
nonevent
|
130
|
0
|
Separate test data set
Use the same steps as the training data set procedure, but calculate the
event probability from the cases for the test data set.
Test with k-fold cross-validation
Use the same steps as the training data set procedure, but calculate the
event probabilities from the cases for the cross-validated data.