menu

Minitab® Support

Methods and formulas for the model summary in Fit Model and Discover Key Predictors with TreeNet^® Classification

Note

This command is available with the Predictive Analytics Module. Click here for more information about how to activate the module.

Choose the method or formula of your choice.

In This Topic

Important predictors
Average –loglikelihood for a binary response
Average –loglikelihood for a multinomial response
Area under ROC curve
95% CI for the area under the ROC curve
Lift
Misclassification rate

Important predictors

The number of predictors with positive relative importance.

A TreeNet^® Classification model comes from a sequence of small regression trees that use generalized residuals as the response variable. The calculation of the model improvement score for a predictor from a single tree has two steps:

Find the reduction in mean squared errors when the predictor splits a node.
Add all the reductions from all the nodes where the predictor is the node splitter.

Then, the importance score for the predictor equals the sum of the model improvement scores across all the trees.

Average –loglikelihood for a binary response

The calculations depend on the validation method.

Training data or no validation

where

and

Notation for training data or no validation

Term	Description
N	sample size of the full or training data set
w_i	weight for the i^th observation in the full or training data set
y_i	i^th response value that is 1 for the event and 0 otherwise for the full or training data set
	predicted probability of the event for the i^th row in the full or training data set
	fitted value from the model

K-fold cross-validation

where

and

Notation for k-fold cross-validation

Term	Description
N	sample size of the full or training data
n_k	sample size of fold k
w_{i, k}	weight for the i^th observation in fold k
y_{i, k}	binary response value of case i in fold k. y_{i, k} = 1 for event class, and 0 otherwise.
	predicted probability for case i in fold k. The predicted probability is from the model that does not use the data in fold k.
	fitted value for case i in fold k. The fitted value is from the model that does not use the data in fold k.

Test data set

where

and

Notation for test data set

Term	Description
n_Test	sample size of the test data set
w_i_{, Test}	weight for the i^th observation in the test data set
y_i_{, Test}	binary response value of case i in fold k in the test data set. y_{i, k} = 1 for event class, and 0 otherwise.
	predicted probability for case i in the test data set
	fitted value for case i in the test data set

Average –loglikelihood for a multinomial response

The calculations depend on the validation method. In the following sections,

is the number of levels in the response variable.

Training data or no validation

where

Notation for training data or no validation

Term	Description
	sample size of the full or training data set
w_i	weight for the i^th observation in the full or training data set
y_{i, q}	i^th response value that is 1 when and 0 otherwise
	predicted probability of the q^th level of the response for the i^th row in the full or training data set
	fitted value from the q^th sequence of trees for the i^th row, which is used to calculate the predicted probability of the q^th level of the response

K-fold cross validation

where

Notation for k-fold cross-validation

Term	Description
N	sample size of the training data
n_k	sample size of fold k
w_{i, k}	weight for the i^th observation in fold k
y_{i, k, q}	i^th response value of case i in fold k that is 1 when and 0 otherwise.
	The predicted probability of the q^th level of the response for the i^th row in fold k. The predicted probability is from the model that does not use the data in fold k.
	The fitted value from the q^th sequence of trees for the i^th row in fold k, which is used to calculate the predicted probability of the q^th level of the response. The fitted value is from the model that does not use the data in fold k.

Test data set

where

Notation for test data set

Term	Description
n_Test	sample size of the test data
w_i,_Test	weight for the i^th observation in the test data
y_i,_Test,_q	i^th response value of case i in the test data set that is 1 when and 0 otherwise.
	The predicted probability of the q^th level of the response for the i^th row in the test data. The predicted probability is from the model that does not use the test data.
	The fitted value for the q^th sequence of trees for the i^th row in the test data, which is used to calculate the predicted probability of the q^th level of the response. The predicted probability is from the model that does not use the test data.

Area under ROC curve

The Model Summary table includes the area under the ROC curve when the response is binary. The ROC curve plots the true positive rate (TPR), also known as power, on the y-axis, and the false positive rate (FPR), also known as type 1 error, on the x-axis. The area under the ROC curve values typically range from 0.5 to 1.

Formula

The area under the curve is a summation of areas of trapezoids:

where k is the number of distinct event probabilities and (x₀, y₀) is the point (0, 0).

To compute the area for a curve from a test data set or from cross-validated data, use the points from the corresponding curve.

Notation

Term	Description
TPR	true positive rate
FPR	false positive rate
TP	true positive, events that were correctly assessed
FN	false negative, events that were incorrectly assessed
P	number of actual positive events
FP	false positive, nonevents that were incorrectly assessed
N	number of actual negative events
FNR	false negative rate
TNR	true negative rate

Example

For example, suppose your results have 4 distinct fitted values with the following coordinates on the ROC curve:

x (false positive rate)	y (true positive rate)
0.0923	0.3051
0.4154	0.7288
0.7538	0.9322
1	1

Then the area under the ROC curve is given by the following calculation:

95% CI for the area under the ROC curve

Minitab calculates a confidence interval for the area under the Receiver Operating Characteristic curve when the response is binary.

The following interval gives the upper and lower bounds for the confidence interval:

The computation of the standard error of the area under the ROC curve () comes from Salford Predictive Modeler^®. For general information about estimation of the variance of the area under the ROC curve, see the following references:

Engelmann, B. (2011). Measures of a ratings discriminative power: Applications and limitations. In B. Engelmann & R. Rauhmeier (Eds.), The Basel II Risk Parameters: Estimation, Validation, Stress Testing - With Applications to Loan Risk Management (2nd ed.) Heidelberg; New York: Springer. doi:10.1007/978-3-642-16114-8

Cortes, C. and Mohri, M. (2005). Confidence intervals for the area under the ROC curve. Advances in neural information processing systems, 305-312.

Feng, D., Cortese, G., & Baumgartner, R. (2017). A comparison of confidence/credible interval methods for the area under the ROC curve for continuous diagnostic tests with small sample size. Statistical Methods in Medical Research, 26(6), 2603-2621. doi:10.1177/0962280215602040

Notation

Term	Description
A	area under the ROC curve
	0.975 percentile of the standard normal distribution

Lift

Minitab displays lift in the model summary table when the response is binary. The lift in the model summary table is the cumulative lift for 10% of the data.

To see general calculations for cumulative lift, go to Methods and formulas for the lift chart for Fit Model and Discover Key Predictors with TreeNet® Classification.

Misclassification rate

In the weighted case, use weighted counts in place of counts.

For k-fold cross-validation, the misclassed count is the sum of the misclassifications from when each fold is the test data set.

For validation with a test data set, the misclassed count is the sum of misclassifications in the test data set and the total count is for the test data set.

Copyright © 2025 Minitab, LLC. All rights Reserved.