3 Introduction to Survival Analysis 3.1 Estimating the survivor function: non-parametric estimation

3.2 Comparing survival distributions between subgroups

•

often we are interested in comparing the survival of two (or more) groups. For example, two treatment groups, males versus females, smokers versus non-smokers etc.
•

the usual two group methods (e.g t-tests to compare group means) are not valid due to censoring
•

separate Kaplan-Meier plots with confidence intervals can used to investigate groups informally
•

example: lung cancer data examine survival outcome by gender

Lung cancer survival by gender

•

males: red (solid) curve; females: blue (dashed) curve

Unnumbered Figure: Link
•

comments: survival appears, on average, to be extended in females but some overlap in the upper limit of the confidence interval
•

question: potential confounders?

Comparing two groups: the log rank test

•

a formal comparison can be made using the log-rank test
•

the null hypothesis is that the survival distributions are equal for the sub-groups (i.e no difference in survival)
•

let $t_{j}$ denote the observed event times and $d_{j}$ the number of events at time $t_{j}$
•

further, let $n_{j}$ denote the number at risk at time $t_{j}$ (e.g. alive at time $t_{j}$ ) of which $n_{1j}$ are in group 1 and $n_{2j}$ are in group 2
•

if no difference: the expected number of events in each group is: $E_{jk}=d_{j}n_{jk}/n_{j}$
•

we actually observe: $O_{jk}=d_{jk}$ , $k=(1,2)$
•

summing over the failure times for the two groups gives $E_{k}=\sum_{j}E_{jk}$ and $O_{k}=\sum_{j}O_{jk}$
•

the log-rank test statistic:

$X^{2}=\sum_{k=1}^{2}(O_{k}-E_{k})^{2}/E\sim\chi_{1}^{2}$

Lung cancer survival by gender: log rank test using R

•

the function: survdiff() conducts the log rank test in R

•

the command and output is below


> survdiff(Surv(time,status)~sex, data=lung)
Call:
survdiff(formula = Surv(time, status) ~ sex,data = lung)

        N Observed Expected (O-E)^2/E (O-E)^2/V
sex=1 138      112     91.6      4.55      10.3
sex=2  90       53     73.4      5.68      10.3

 Chisq= 10.3  on 1 degrees of freedom, p= 0.00131

•

the log-rank test result indicates significant difference in the survival outcomes for male and female lung cancer patients
•

comments?

Comparing more than two groups

•

Kaplan-Meier curves can be obtained for more than two sub-groups and survival compared informally
•

it may be preferable not to add the confidence intervals since the plots can become confusing
•

the log rank test can be used to compare more than two groups
•

more generally, a model can be fit to the data and potential for confounding accommodated
•

the Cox proportional hazards model is commonly used to flexibly model covariate effects on the hazard function