When performing statistical analysis of clinical trial data we need to consider (amongst other things):
the outcome (endpoint) type and comparison of interest (e.g difference in treatment group means/group proportions; ratio of treatment group means)
distributional assumptions regarding the outcome measurements/parameters of interest (e.g Normal/Bernoulli/Poisson etc.);
the structure in the data imposed by the design (e.g paired designs, cross-over designs, randomised block designs etc);
Estimating versus testing issues (clinical significance versus relevance);
The analysis set (ITT/per protocol);
Analyses specified a priori (i.e. justified by design and specified in the protocol) and those which are exploratory.
Continuous measures on trial participants are typically summarised in
term of group means/medians, for example, mean diastolic blood
pressure/mean height by treatment group.
Clinically the outcome of interest is the difference, , between the groups: the ‘treatment effect’.
Not only are we interested in the estimated magnitude of the
difference, , we are also interested in the degree of precision of the estimate as measured by its standard error, , or a confidence interval.
A p-value quantifying the play of chance under the null may be interesting but it is
the size of the difference which is of interest in terms of clinical relevance.
A study may continue if is sufficiently large even if it is not
significantly different from zero.
Larger samples increased precision.
Direct testing approach: two-sample t-test.
Let denote the treatment group mean and the control
group mean.
Research Hypothesis
Assuming a common underlying variance the corresponding standard error is estimated by pooling the data :
with
where is the total numer of patients recruited.
The random variable is compared to the -distribution with degrees of freedom and inference upon:
Alternatively, let and represent the sample standard deviations for the and and represent the group numbers compute the pooled variance estimate:
and as previous the standard error is given by:
Assumptions: underpinning the t-test?
Non-parametric alternative Mann-Whitney test.
An approximate test assuming non-constant variances can be performed (Welch’s) similarly with
and an adjustment to the degrees of freedom.
The assumptions underlying the two-sample t-test are:
the data are interval scale
the data are independent
the sampling distributions are normal
a common underlying variance.
Comments:
Exploratory analyses can be used to investigate the adequacy of the underlying assumptions.
Transformation can be performed to yield approximate normality,
for example, a logarithmic transform for positively skewed data.
The t-test is robust to symmetric departures from Normality. If in doubt a non-paramteric test can be used.
An approximate test assuming non-constant variances can be performed (Welch’s) similarly with
and an adjustment to the degrees of freedom.
Using R to perform the t-test is straight forward using the following
code.
Standard test,
> test1=t.test(y,x, var.equal=T)
Welch’s test,
> test2=t.test(y,x, var.equal=F)
Direct testing procedures (e.g T-tests) yield p-values but do not allow for assessment of effect sizes. In practice an estimate of the difference and corresponding confidence interval is preferable.
Parameter of interest: difference, , of treatment group expectations.
Estimator: observed difference of group means
95% confidence interval:
with
inference: Does the confidence interval for the true difference, , span zero?
interval interpretation: Under repeated sampling we would expect of such intervals to contain the true parameter value.
intervals are more informative: can assess plausible range of effect size.
statistical significance versus clinical relevance: a significant p-value does not infer clinical relevance.
Good practice: CONSORT statement.
See: Exercises on Moodle.
Patient responses will often vary according to their respective values
at baseline (entry to trial).
A proper baseline measurement is a measure of the end-point of
interest taken prior to randomisation.
Two common approaches are to:
base estimation on the differences from baseline , comparing group mean change
analysis of covariance (ANCOVA) regression model including baseline as a covariate.
Baseline measurements can increase precision!
ANCOVA is the preferred approach: method ‘adjusts’ the estimate for baseline imbalance and regression to the mean
example: Blood pressure after 12 weeks of treatment or difference from baseline
For a detailed discussion of the relative merits see Senn S, 1999
See Vickers, A. and Altman, D., Statitics notes: Analysing controlled trials with baseline and follow-up measurements. BMJ, 2001, 1123-112.
Significance testing of baselines is not appropriate.
For paired designs (for example, a patients left eye and right eye) a one sample t-test is performed by
averaging the within-pair differences:
where
Computing the standard deviation of the differences .
Calculating the standard error of the mean of the differences .
Computing the test statistic:
under versus
Comparing to a t-distribution on degrees of freedom.
Using R is again straight forward using:
> test3=t.test(y,mu=0)
where is the observed vector of within-pair differences .
Assumptions for the paired design?
Note for later: a paired t-test is used to perfom a ‘simple analysis’ of cross-over trial data.
Responses may be dichotomous, for example, cancer free at five
years/cancer recurred, died/survived, diseased/not disease.
Clinically interest then lies in comparing the treatment group proportions, , say.
Consider the following general tabular representation of results
Treatment Group | ||||
---|---|---|---|---|
1 | 2 | Total | ||
Outcome | Yes | a | b | a+b |
observed | No | c | d | c+d |
Total | a+c | b+d | n |
Let and denote the respective risks (success probabilities) for the treatment and control groups.
the risk difference, , is estimated:
the relative risk, , is estimated:
The disease odds, , provides the ratio of success to failure. The odds ratio comparing groups is estimated:
Aim: association between hay fever and eczema in 11 year old children.
Hay fever | ||||
---|---|---|---|---|
Yes | No | Total | ||
Eczema | Yes | 141 | 420 | 561 |
No | 928 | 13 525 | 14 453 | |
Total | 1069 | 13 945 | 15 522 |
event: Eczema=“Yes”
event: Eczema=“No” (swap rows)
!
Look at the table the other way around:
risk difference:
relative risk:
odds ratio:
the same whichever way round one looks at table!
Intervals based upon the Normal approximation CI:
with , , and
.
.
Aim is to compare the odds of eczema amongst hayfever
sufferers to non-sufferers.
Compute the odds ratio:
Compute the standard error of the log of the odds ratio and then compute a confidence interval for the log odds ratio and then back-transform:
95% CI: antilog:
(asymmetric).
Further examples on exercise sheet.
Case-control studies: should not be used! (subj. selection depends on outcome).
Logistic regression: adjustment for covariates.
Odds ratios do not contain information as to absolute risk difference.
If outcome rare, then ().
Further reading: Bland JM, Altman DG (2000) BMJ 320: 1468.