Chapter 3 Introduction to Survival Analysis

  • survival analysis: “analysis of data in the form of times from some well-defined time origin to occurrence of some event or endpoint” (Collett, 2003)

  • medical studies: the time origin may be the time of entry in to a clinical trial; time of diagnosis of disease; time of commencement of treatment; time of surgery; date of birth etc.

  • if the endpoint is death then the times to event are literally survival times, for example, time to death following kidney transplantation

  • more generally, time-to-event data take various forms for example: time to onset of heart disease; time to failure of a prosthesis; time to treatment failure; time to relief of pain; time to recurrence of symptoms etc

Survival Analysis

  • the events may be:

    • positive, such as discharge from hospital or time to conception;

    • adverse, such as death or recurrence of disease

    • neutral, such as cessation of breast feeding

  • regardless of nature the convention is to refer to this type of data as survival data and the analysis as survival analysis

  • the time-to-event is a random variable, T, often referred to as a lifetime random variable

Example, lung cancer

  • survival times of patients with advanced lung cancer

  • 228 patients 165 deaths observed

Unnumbered Figure: Link

Special features of survival data

  • time-to-event data are not amenable to standard methods of analysis since the event times are:

    • positive-continuous

    • typically skewed

    • subject to censoring

  • censoring occurs when the event of interest (end-point) is not observed:

    • right censoring: the event time exceeds the last follow-up time

    • left censoring: the event time precedes the last follow-up time but is unknown

    • interval censoring: the event time falls in some specified interval

Right censoring

  • left/interval censoring occurs less frequently than right censoring. In this module we will consider methods for right censored data

  • right censored observations: we do not know when, or if, the patient will experience the event, only that the event has not occurred at the end of the observation period (last follow-up)

  • right censoring can be due to:

    • the period of observation ending prior to the event occurring (e.g. five year study period)

    • loss to follow-up (e.g. moved away, did not return for scheduled follow-up)

    • a competing event which precludes further follow-up (e.g. a death occurs before a hip prosthesis fails)

    • note also the event may not be inevitable (e.g. time to pregnancy)

    • censoring cannot be ignored: observations carry important information about survival

    • consider comparing two treatments: a more effective treatment will result in increased survival and hence increased censoring at the end of follow-up

Patient time and study time

  • Patients are typically not recruited at the same time but are accrued sequentially over a period of time and then are followed-up to a fixed date the period of observation thus varies between patients

  • assumptions: patients prognosis does not depend upon time of entry to the study (less of a problem in randomised trials)

  • patients lost to follow-up have the same prognosis as those remaining in the study (i.e. random censoring)

Lung cancer data example

id  inst time status age sex ph.ecog ph.karno pat.karno meal.cal wt.loss
1     3  306      2  74   1       1       90       100     1175      NA
2     3  455      2  68   1       0       90        90     1225      15
3     3 1010      1  56   1       0       90        90       NA      15
4     5  210      2  57   1       1       90        60     1150      11
5     1  883      2  60   1       0      100        90       NA       0
6    12 1022      1  74   1       1       50        80      513       0
7     7  310      2  68   2       2       70        60      384      10
8    11  361      2  71   2       2       60        80      538       1
9     1  218      2  53   1       1       70        80      825      16
10    7  166      2  61   1       2       70        70      271      34
.     .  .        .  .    .       .       .         .        .        .
.     .  .        .  .    .       .       .         .        .        .
.     .  .        .  .    .       .       .         .        .        .
.     .  .        .  .    .       .       .         .        .        .
.     .  .        .  .    .       .       .         .        .        .
228   22  177     1  58   2       1       80        90     1060       0

Lung cancer data example description


Format:

    inst:       Institution code

    time:       Survival time in days

    status:     censoring status 1=censored, 2=dead

    age:        Age in years

    sex:        Male=1 Female=2

    ph.ecog:    ECOG performance score (0=good 5=dead)

    ph.karno:   Karnofsky performance score (bad=0-good=100) rated by physician

    pat.karno:  Karnofsky performance score as rated by patient

    meal.cal:   Calories consumed at meals

    wt.loss:    Weight loss in last six months

Aims of survival Analysis

  • model the survival times for a single group

  • compare survival distributions for two or more groups

  • assess the effects of covariates on survival

  • make predictions

  • usually the event times will be continuous measurements, but they are typically recorded in rounded form

  • thus, although the data are strictly continuous, our methods must allow for potential ties in the data caused by rounding

Notation

  • let T denote the life time random variable

  • ti(i=1,2,,n) observed event times

  • ci(i=1,2,,n): censoring times,

  • δi: censoring/failure indicator - 0 if censored, 1 if failure

  • xi: p-vector of covariates for individual i

  • ni(i=1,2,,n): risk set - number at risk just before ti

  • di: the number of events (e.g. deaths) at time ti

Data frame for survival data

unit time cens X1 X2
1 y1 δ1 x11 x12
2 y2 δ2 x21 x22
3 y3 δ3 x31 x32

where

Yi = min(Ti,Ci)
δi = I(TiCi)
  • the censoring time Ci is the time at which unit i leaves the study, with realised valued ci

  • censoring times may be fixed or random, e.g. 5-year study period

  • we do not observe both Ti and Ci

  • we record ti if TiCi or else we record ci if Ti>Ci

  • hence we record Yi=min(Ti,Ci) and a censoring indicator δi=I(TiCi)

Basic Functions I

  • in summarising survival data there are two functions of central interest: the survivor function and the hazard function

  • the survival time of individual i: a realisation of a non-negative random variable T

  • let F(t) denote the distribution function of T with corresponding probability density function f(t) then

    F(t)=P(Tt)=0tf(s)ds
  • by definition the pdf is f(t)=dF(t)dt

  • the probability that an individual survives to time t is given by the survivor function

    S(t)=P(T>t)=1-F(t)=tf(s)ds
  • note that S(t) is a monotone decreasing function with S(0)=1 and tends to zero as t approaches infinity

Basic Functions II

  • conversely we can express the pdf as:

    f(t)=limΔt0P(tT<t+Δt)Δt=dF(t)dt=-dS(t)dt

  • the hazard function specifies the instantaneous rate of failure at T=t given survival to time t and is defined:

    h(t)=limΔt0P(tT<t+Δt|T>t)Δt=f(t)S(t)

  • the hazard is a rate not a probability. It can assume values in [0,)

  • the quantity h(t)Δt approximates the probability that an individual who has survived to time t will experience the event in the interval (t,t+Δt)

Basic Functions III

  • the cumulative or integrated hazard function is by definition

    H(t)=0th(u)𝑑u

  • other relationships follow:

    • S(t)=1-0tf(u)𝑑u

    • f(t)=h(t)×S(t)

    • h(t)=-d(log(S(t)))dt

    • S(t)=exp{-H(t)}

Relationships between basic functions

  • the density, distribution, survivor and hazard functions are different mechanisms for describing the distribution of survival times

  • these functions capture the essential features of lifetime variables

  • specifying one function completely determines the others

  • consequently one may interchange between them but a model may be better specified in terms of one rather than another

Examples of survivor functions

Unnumbered Figure: Link

Examples of hazard functions

Unnumbered Figure: Link

Hazard functions

  • the hazard function tells us about the effect of time on the probability of failure

  • the hazard informs us of failure rates, for example, of patients of a certain age

  • there are many general shapes for the hazard function. Generic types are: increasing, decreasing, constant and bathtub:

    • an increasing hazard function is indicative of natural ageing (or wearing out)

    • a decreasing hazard functions is less likely clinically but may fit, for example, risk following organ transplantation

    • a bath-tub shaped hazard fits, for example, population risk of death from birth: infant deaths give rise to increased events early on, the process then stabilises prior to increasing with age