suppose we have a single sample of observation times for subjects
survivor function:
empirical survivor function:
the estimated survival function is a step function with steps at observed failure times
a problem arises in the above methodology due to censoring
could simply exclude all censored observations:
wasteful of information
biases estimation
typically the first step in the presence of censored data is to obtain the Kaplan-Meier estimate of the survivor function
some values may be right censored and there may also be individuals with the same observed survival time
example data: ( means censored)
suppose that there are observed failures with
further suppose the event times are in ascending order the th of which is denoted for
the ordered event times are thus:
the number in the sample who are at risk just prior to time are given by for and are the number of who fail at this time
the estimated probability of survival through interval is then:
assuming that the event times occur independently of one another leads to the Kaplan-Meier estimator:
for sub-intervals and with for
0 | 0 | 6 | 1.000 | 1.000 |
1 | 1 | 6 | 0.833 | 0.833 |
3 | 1 | 5 | 0.800 | 0.667 |
9 | 2 | 3 | 0.333 | 0.222 |
Kaplan-Meier also known as the product limit estimator
note if no censoring then this is simply the empirical survivor function
Unnumbered Figure: Link
we also need to address the question of uncertainty in our estimate
various formula exist for the estimated variance
Greenwood’s formula
can give confidence bands
can give confidence bands
Unnumbered Figure: Link
Unnumbered Figure: Link
note how the plot appears smoother with more data and more distinct failure times
the survivor function commences at 1 and the estimated survival close to 10% at about 2.5 years