Methods and formulas for the model summary in Fit Binary Logistic Model and Binary Logistic Regression

Select the method or formula of your choice.

In This Topic

Deviance R²
Adjusted Deviance R²
Akaike Information Criterion (AIC)
AICc (Akaike's Corrected Information Criterion)
BIC (Bayesian Information Criterion)
Test deviance R²
K-fold Deviance R²
Area under ROC curve

Deviance R²

The deviance R² indicates how much variation in the response is explained by the model. The higher the R², the better the model fits your data. The formula is:

Notation

Term	Description
D_E	Error Deviance
D_T	Total Deviance

Adjusted Deviance R²

The adjusted deviance R² accounts for the number of predictors in your model and is useful for comparing models with different numbers of predictors. The formula is:

Notation

Term	Description
R²	the deviance R²
p	the regression degrees of freedom
Φ	1, for binomial and Poisson models
D_T	the total deviance

While the calculations for adjusted deviance R² can produce negative values, Minitab displays zero for these cases.

Akaike Information Criterion (AIC)

Use this statistic to compare different models. The smaller AIC is, the better the model fits the data.

The log-likelihood functions are parameterized in terms of the means. The general form of the functions follow:

The general form of the individual contributions follows:

The specific form of the individual contributions depends on the model.

Model	l_i
Binomial
Poisson

Notation

Term	Description
p	the regression degrees of freedom
L_c	the log-likelihood of the current model
y_i	the number of events for the i^th row
m_i	the number of trials for the i^th row
	the estimated mean response of the i^th row

AICc (Akaike's Corrected Information Criterion)

AICc is not calculated when .

Notation

Term	Description
p	the number of coefficients in the model, including the constant
n	the number of rows in the data with no missing data

BIC (Bayesian Information Criterion)

Notation

Term	Description
p	the number of coefficients in the model, not counting the constant
n	the number of rows in the data with no missing data

Test deviance R²

The test deviance R² indicates how much of the variation in the response of the test data set the model explains. The higher the value, the better the model fits the test data.

Formula

The following equation gives the formula for the test deviance R²:

where the following equation represents the error deviance:

The formula for the total deviance, D_T(Test), depends on the form of the model.

Binary logistic

where for models with an intercept term,

has the following definition:

For models without an intercept term, use the inverse of the link function at 0. The values for the link functions in Minitab follow:

Logit link function: = 0.5.
Normit link function: = 0.5.
Gompit link function: .

Poisson

where for models with an intercept term

For models without an intercept term,

Notation

Term	Description
N(Test)	the number of rows in the test data set
	the squared deviance residuals
y_i	the number of events for the i^th row in the test data set
m_i	the number of trials for the i^th row in the test data set
D_E(Test)	the error deviance for the test data set
D_T(Test)	the total deviance for the test data set

K-fold Deviance R²

The k-fold deviance R² indicates how much of the variation in the response of the validation data set the model explains. The higher the value, the better the model fits the test data.

Where

and D_T is the total deviance.

Notation

Term	Description
K	number of folds
n_j	sample size of fold j
	cross validated deviance residual for the i^th row of fold j

Area under ROC curve

Formula

The area under the curve is the summation of areas of trapezoids:

where k is the number of distinct event probabilities and (x₀, y₀) is the point (0, 0).

To compute the area for a curve from a test data set or from cross-validated data, use the points from the corresponding curve.

For example, suppose we have four distinct event probabilities with the following coordinates on the ROC curve:

x (false positive rate)	y (true positive rate)
0.0923	0.3051
0.4154	0.7288
0.7538	0.9322
1	1

Then the area under the ROC curve is given by the following calculation:

Notation

Term	Description
TRP	true positive rate
FPR	false positive rate
TP	true positive, events that were correctly assessed
P	number of actual positive events
FP	true negative, nonevents that were correctly assessed
N	number of actual negative events
FNR	false negative rate
TNR	true negative rate

Methods and formulas for the model summary in Fit Binary Logistic Model and Binary Logistic Regression

In This Topic

Deviance R2

Notation

Adjusted Deviance R2

Notation

Akaike Information Criterion (AIC)

Notation

AICc (Akaike's Corrected Information Criterion)

Notation

BIC (Bayesian Information Criterion)

Notation

Test deviance R2

Formula

Notation

K-fold Deviance R2

Notation

Area under ROC curve

Formula

Notation

Deviance R²

Adjusted Deviance R²

Test deviance R²

K-fold Deviance R²