Goodness-of-Fit Tests table for Fit Cox Model in a Counting Process Form

Minitab Statistical Software provides 3 goodness-of-fit tests: the global Wald test, the global likelihood ratio test, and the global score test. If there are no tied event times, then the score test is identical to the well-known log-rank test. In an analysis with clusters, Minitab does not provide the global likelihood ratio test because this test assumes that observations within clusters are independent. The interpretation of the statistics is the same for all 3 tests.

DF

The degrees of freedom for the goodness-of-fit tests are the sum of the degrees of freedom for the terms in the model. This sum equals the number of parameters in the model.

Chi-square

Each goodness-of-fit test has a chi-square statistic. The chi-square statistic is the test statistic that determines whether the model has an association with the response.

Minitab uses the chi-square statistic to calculate the p-value, which you use to make a decision about the statistical significance of the terms and the model. The p-value is a probability that measures the evidence against the null hypothesis. Lower probabilities provide stronger evidence against the null hypothesis. A sufficiently large chi-square statistic results in a small p-value, which indicates that the model fits the data.

P-value

The p-value is a probability that measures the evidence against the null hypothesis. Lower probabilities provide stronger evidence against the null hypothesis.

Interpretation

Use the goodness-of-fit tests to determine how well the model fits your data. The null hypothesis is that the model does not fit the data well. Usually, a significance level (denoted as α or alpha) of 0.05 works well. A significance level of 0.05 indicates a 5% risk of concluding that the model fits the data well when it doesn't.

Under the null hypothesis, the test statistic for each test has an asymptotic chi-square distribution with degrees of freedom equal to the number of coefficients in the model. The asymptotic distribution is valid when the number of observed events is large compared to the number of estimated parameters. For categorical predictors, the number of events in each level must be large enough for the asymptotic distribution to be valid.
P-value ≤ α: The model fits the data well
If the p-value is less than or equal to the significance level, you can conclude that the model fits the data well. You should examine whether any of the terms are statistically significant and also ensure that the model satisfies the proportional hazards assumption.
P-value > α: There is not enough evidence to conclude that the model fits the data well
If the p-value is greater than the significance level, you cannot conclude that the model fits the data well. You may want to refit the model with different terms.