For each subject
let
be the step function that represents the number of events that subject
experiences up to time .
Then
represents a counting process for subject .
Let
be an indicator variable that has the value 1 if subject
i is at risk at time
t and 0 otherwise, which is equivalent to
if
and
otherwise.
The Cox proportional hazards model assumes that the hazard rate at time
for an individual
with a vector of predictor values
has the following form:
where
is the baseline hazard rate that characterizes the unspecified distribution of
survival time and
is a
p-component vector of unknown regression coefficients.
For example, a formulation of the Cox proportional hazards model as a
counting process based on Andersen et al. (1993)1 and Fleming and Harrington (1991)2,
assuming no tied event times, has a log partial likelihood with the following
form:
The vector of partial derivatives with respect to the components of
has the following form:
The
p by
p information matrix has the following form:
where the weighted mean of the subjects at risk at time
has the following form:
This formulation of the Cox proportional hazards model is the multiplicative
hazards model. The multiplicative hazards model has the following
characteristics:
The subject can experience
more than one event of interest.
The subject can experience an
event multiple times. This statement means that the indicator variable that
identifies if the subject is at risk, ,
can change states from 1 to 0 and back again multiple times.
The subject can enter the
study after time 0. This statement is equivalent to the idea that a subject can
enter the risk set after time 0. A time is left-truncated when the subject
enters after time 0.
Therneau (1999)3 provides details on the counting process input form of data. The counting
process input form of data provides a technique to fit the multiplicative
hazards model with the the same algorithms that fit the Cox proportional
hazards model.
The counting process input form
In the counting process input form, multiple rows represent each subject.
Each row describes a time interval when the values of all the variables are
constant. Time-dependent predictors change between rows. The intervals begin
just after the start time and include the end time. The start time for the
interval is the entry time for the subject. The end time is the response
variable for the subject. The censoring column indicates any row where the end
time is not an event time.
Correlated observations and the robust covariance
estimator
Although multiple rows represent each subject in the counting process input
form, only one row of the per-subject observations contributes to the
likelihood at each time unless correlation exists among the observations in a
subgroup that pertain to each subject. For example, the subject observations
are correlated in models that include repeated or recurrent events. Lin and Wei
(1989)4 propose an adjustment of the covariance matrix to account for the
correlation among within-subject observations. Let
be the matrix of score residuals. Then, the robust variance covariance matrix
has the following form:
where
and
is the collapsed score residual matrix. To obtain the collapsed score residual
matrix, replace each cluster of score residual rows by the sum of those
residual rows.
An analysis that uses the robust variance-covariance matrix has the
following characteristics:
Calculations for inferences
use the robust variance-covariance matrix.
The Wald and Score tests in
the Goodness-of-Fit table use the robust variance-covariance matrix. The
likelihood ratio test in the Goodness-of-Fit table is missing because the
likelihood ratio test assumes that the observations within a cluster are
independent.
The ANOVA table can use only
the Wald test.
1 Andersen, P. K., Borgon,
O., Gill, R.D., and Keiden, N. (1993).
Statistical models based on counting processes.
Springer-Verlag.
2 Fleming, T. R., and
Harrington, D. P. (1991).
Counting processes and survival analysis. Wiley.
3 Therneau, T. M. (1999).
Technical report series No. 53: A package for survival analysis in
S.
4 Lin, D.Y. & Wei, L.J. (1989). The robust inference for the Cox
proportional hazards model.
Journal of the American Statistical Association, 84 (408),
1074-1078.
https://doi.org/10.1080/01621459.1989.10478874