survival analysis: “analysis of data in the form of times
from some well-defined time origin to occurrence of some event
or endpoint” (Collett, 2003)
medical studies: the time origin may be the time
of entry in to a clinical trial; time of diagnosis of disease;
time of commencement of treatment; time of surgery; date of birth etc.
if the endpoint is death then the times to event are literally survival times, for example, time to death following
kidney transplantation
more generally, time-to-event data take various forms for
example: time to onset of heart disease; time to failure of a
prosthesis; time to treatment failure; time to relief of pain; time
to recurrence of symptoms etc
the events may be:
positive, such as discharge from hospital or
time to conception;
adverse, such as death or recurrence of disease
neutral, such as cessation of breast feeding
regardless of nature the convention is to refer to this
type of data as survival data and the analysis as survival analysis
the time-to-event is a random variable, , often referred to as a
lifetime random variable
survival times of patients with advanced lung cancer
228 patients 165 deaths observed
Unnumbered Figure: Link
time-to-event data are not amenable to standard methods of
analysis since the event times are:
positive-continuous
typically skewed
subject to censoring
censoring occurs when the event of interest (end-point) is not observed:
right censoring: the event time exceeds the last follow-up
time
left censoring: the event time precedes the last follow-up time
but is unknown
interval censoring: the event time falls in some specified interval
left/interval censoring occurs less frequently than
right censoring. In this module we will consider methods for right
censored data
right censored observations: we do not know when, or if, the
patient will experience the event, only that the event has not
occurred at the end of the observation period (last follow-up)
right censoring can be due to:
the period of observation ending prior to the event occurring (e.g. five year study period)
loss to follow-up (e.g. moved away, did not return for scheduled follow-up)
a competing event which precludes further follow-up
(e.g. a death occurs before a hip prosthesis fails)
note also the event may not be inevitable (e.g. time to
pregnancy)
censoring cannot be ignored: observations
carry important information about survival
consider comparing two
treatments: a more effective treatment will result in increased
survival and hence increased censoring at the end of follow-up
Patients are typically not recruited at the same time but are accrued sequentially over
a period of time and then are followed-up to a fixed
date the period of observation
thus varies between patients
assumptions: patients prognosis does not depend upon time of entry to
the study (less of a problem in randomised trials)
patients lost to follow-up have the same prognosis as those
remaining in the study (i.e. random censoring)
id inst time status age sex ph.ecog ph.karno pat.karno meal.cal wt.loss 1 3 306 2 74 1 1 90 100 1175 NA 2 3 455 2 68 1 0 90 90 1225 15 3 3 1010 1 56 1 0 90 90 NA 15 4 5 210 2 57 1 1 90 60 1150 11 5 1 883 2 60 1 0 100 90 NA 0 6 12 1022 1 74 1 1 50 80 513 0 7 7 310 2 68 2 2 70 60 384 10 8 11 361 2 71 2 2 60 80 538 1 9 1 218 2 53 1 1 70 80 825 16 10 7 166 2 61 1 2 70 70 271 34 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 22 177 1 58 2 1 80 90 1060 0
Format: inst: Institution code time: Survival time in days status: censoring status 1=censored, 2=dead age: Age in years sex: Male=1 Female=2 ph.ecog: ECOG performance score (0=good 5=dead) ph.karno: Karnofsky performance score (bad=0-good=100) rated by physician pat.karno: Karnofsky performance score as rated by patient meal.cal: Calories consumed at meals wt.loss: Weight loss in last six months
model the survival times for a single group
compare survival distributions for two or more groups
assess the effects of covariates on survival
make predictions
usually the event times will be continuous measurements, but they are
typically recorded in rounded form
thus, although the data are strictly continuous, our methods must
allow for potential ties in the data caused by rounding
let denote the life time random variable
observed event times
: censoring times,
: censoring/failure indicator - 0 if censored, 1 if failure
: -vector of covariates for individual
: risk set - number at risk
just before
: the number of events (e.g. deaths) at time
unit | time | cens | |||
---|---|---|---|---|---|
1 | |||||
2 | |||||
3 | |||||
where
the censoring time is the time at which unit leaves the study,
with realised valued
censoring times may be fixed or random, e.g. -year study period
we do not observe both and
we record if or else we record if
hence we record and a censoring indicator
in summarising survival data there are two functions of central
interest: the survivor function and the hazard function
the survival time of individual : a realisation of a
non-negative random variable
let denote the distribution function of with corresponding probability density function then
by definition the pdf is
the probability that an individual survives to time is given by the survivor function
note that is a monotone decreasing function with and tends to zero as approaches infinity
conversely we can express the pdf as:
the hazard function specifies the instantaneous rate of
failure at given survival to time and is defined:
the hazard is a rate not a probability. It can assume values in
the quantity approximates the probability that
an individual who has survived to time will experience the event
in the interval
the cumulative or integrated hazard function is by definition
other relationships follow:
the density, distribution, survivor and hazard functions are
different mechanisms for describing the distribution of survival times
these functions capture the essential features of lifetime variables
specifying one function completely determines the others
consequently one may interchange between them but a model may be better specified in terms of one rather than another
Unnumbered Figure: Link
Unnumbered Figure: Link
the hazard function tells us about the
effect of time on the probability of failure
the hazard informs us of failure rates, for example, of
patients of a certain age
there are many general shapes for the hazard function. Generic types are: increasing, decreasing, constant and bathtub:
an increasing hazard function is indicative of natural ageing (or wearing out)
a decreasing hazard functions is less likely clinically but may fit, for example, risk following organ transplantation
a bath-tub shaped hazard fits, for example, population risk of death from birth: infant deaths give rise to increased events early on, the process then stabilises prior to increasing with age