3 Introduction to Survival Analysis

3.1 Estimating the survivor function: non-parametric estimation

  • suppose we have a single sample of observation times for subjects i=1,,n

  • survivor function: S(t)=ProbT>t

  • empirical survivor function:

    S^(t)=number of individuals with survival times>ttotal number of participants
  • the estimated survival function S^(t) is a step function with steps at observed failure times

  • a problem arises in the above methodology due to censoring

  • could simply exclude all censored observations:

    • wasteful of information

    • biases estimation

Kaplan-Meier

  • typically the first step in the presence of censored data is to obtain the Kaplan-Meier estimate of the survivor function

  • some values may be right censored and there may also be individuals with the same observed survival time

  • example data: 1,3,7,9,9,10 ( means censored)

  • suppose that there are r observed failures with rn

  • further suppose the event times are in ascending order the jth of which is denoted t(j) for j=1,,r

Kaplan-Meier II

  • the ordered event times are thus: t(1)<t(2)<<t(r)

  • the number in the sample who are at risk just prior to time t(j) are given by nj for j=1,2,,r and dj are the number of who fail at this time

  • the estimated probability of survival through interval j is then:

    pj=nj-djnj
  • assuming that the event times occur independently of one another leads to the Kaplan-Meier estimator:

    S^(t)=j=1kpj=j=1k(nj-djnj)

    for k=r+1 sub-intervals and with S^(t)=1 for t<t1

Kaplan-Meier Example

tj dj nj 1-dj/nj S^(t)
0 0 6 1.000 1.000
1 1 6 0.833 0.833
3 1 5 0.800 0.667
9 2 3 0.333 0.222
  • S^(t)=p^(T>t)=p1×p2××pk

  • Kaplan-Meier also known as the product limit estimator

  • note if no censoring then this is simply the empirical survivor function

Example of the Kaplan-Meier estimator

Unnumbered Figure: Link

Confidence intervals for the Kaplan-Meier Estimator

  • we also need to address the question of uncertainty in our estimate

  • various formula exist for the estimated variance

  • Greenwood’s formula

    Var^(S^(t))=S^(t)2j=1k(djnj{nj-dj})
    • can give confidence bands <0

    • can give confidence bands >1

Example of the Kaplan-Meier estimator

Unnumbered Figure: Link

Lung cancer data: estimated survivor function

Unnumbered Figure: Link

  • note how the plot appears smoother with more data and more distinct failure times

  • the survivor function commences at 1 and the estimated survival close to 10% at about 2.5 years