Methods and Formulas for the Receiver Operating Characteristic (ROC) curve chart for Fit Model and Discover Key Predictors with TreeNet® Classification

Note

This command is available with the Predictive Analytics Module. Click here for more information about how to activate the module.

The procedure for the points on the ROC curve depends on the validation method. For a multinomial response variable, Minitab displays multiple charts that treat each class as the event in turn.

No validation

For the chart for a training data set, each point on the chart represents a distinct fitted event probability. The highest event probability is the first point on the chart and appears leftmost. The other event probabilities are in decreasing order.

Use the following process to find the x- and y-coordinates for the chart.

  1. Use every distinct event probability as a threshold. For a specific threshold, cases with estimated event probability greater than or equal to the threshold get 1 as the predicted class, 0 otherwise. Then, you can form a 2x2 table for all cases with observed classes as rows and predicted classes as columns to calculate the false positive rate and the true positive rate for each event probability. The false positive rates are the x-coordinates for the chart. The true positive rates are the y-coordinates.

    For example, suppose the following table summarizes a simplistic model with two, 2-level categorical predictors. These predictors give four distinct event probabilities, which are rounded to 2 decimal places:

    A: Order B: Predictor 1 C: Predictor 2 D: Number of events E: Number of nonevents F: Number of trials G: Threshold (Fitted event probability)
    1 1 1 18 12 30 0.60
    2 1 2 25 42 67 0.37
    3 2 1 12 44 56 0.21
    4 2 2 4 32 36 0.11
    Totals 59 130 189

    The following are the corresponding four tables with their respective false positive rates and true positive rates rounded to 2 decimal places:

    Table 1. Threshold = 0.60.

    False positive rate = 12 / (12 + 118) = 0.09

    True positive rate = 18 / (18 + 41) = 0.31

    Predicted
    event nonevent
    Observed event 18 41
    nonevent 12 118
    Table 2. Threshold = 0.37.

    False positive rate = (12 + 42) / 130 = 0.42

    True positive rate = (18 + 25) / 59 = 0.73

    Predicted
    event nonevent
    Observed event 43 16
    nonevent 54 76
    Table 3. Threshold = 0.21.

    False positive rate = (12 + 42 + 44) / 130 = 0.75

    True positive rate = (18 + 25 + 12) / 59 = 0.93

    Predicted
    event nonevent
    Observed event 55 4
    nonevent 98 32
    Table 4. Threshold = 0.11.

    False positive rate = (12 + 42 + 44 + 32) / 130 = 1

    True positive rate = (18 + 25 + 12 + 4) / 59 = 1

    Predicted
    event nonevent
    Observed event 59 0
    nonevent 130 0

Separate test set

Use the same steps as the training set procedure, but calculate the event probabilities from the cases for the test set.

Test with k-fold cross-validation

Use the same steps as the training data set procedure, but calculate the event probabilities from the cases for the cross-validated data.