3 Trial Objectives and Endpoints 5 Overview of Trial Designs

4 Defining and Estimating Treatment Effects

4.1 Treatment effects

When performing statistical analysis of clinical trial data we need to consider (amongst other things):

•

the outcome (endpoint) type and comparison of interest (e.g difference in treatment group means/group proportions; ratio of treatment group means)
•

distributional assumptions regarding the outcome measurements/parameters of interest (e.g Normal/Bernoulli/Poisson etc.);
•

the structure in the data imposed by the design (e.g paired designs, cross-over designs, randomised block designs etc);
•

Estimating versus testing issues (clinical significance versus relevance);
•

The analysis set (ITT/per protocol);
•

Analyses specified a priori (i.e. justified by design and specified in the protocol) and those which are exploratory.

4.2 Two group comparisons: continuous response data

Continuous measures on trial participants are typically summarised in term of group means/medians, for example, mean diastolic blood pressure/mean height by treatment group.

Clinically the outcome of interest is the difference, $D$ , between the groups: the ‘treatment effect’.

Not only are we interested in the estimated magnitude of the difference, $\hat{D}$ , we are also interested in the degree of precision of the estimate as measured by its standard error, $\mbox{se}(\hat{D})$ , or a confidence interval.

A p-value quantifying the play of chance under the null may be interesting but it is the size of the difference which is of interest in terms of clinical relevance.

A study may continue if $\hat{D}$ is sufficiently large even if it is not significantly different from zero.

Larger samples $\longrightarrow$ increased precision.

4.3 Two independent group means

Direct testing approach: two-sample t-test.

Let $\mu_{T}$ denote the treatment group mean and $\mu_{C}$ the control group mean.

Research Hypothesis

H_{0}:D=\mu_{T}-\mu_{C}=0\text{ \emph{versus} }H_{1}:D=\Delta\not=0.

Test Statistic computed under $H_{0}$

T=\frac{\hat{D}=\bar{X}_{T}-\bar{X}_{C}}{SE(\hat{D})}.

Assuming a common underlying variance the corresponding standard error is estimated by pooling the data :

SE(\hat{D})=S\sqrt{\frac{1}{n_{T}}+\frac{1}{n_{C}}}

with

\quad S^{2}=\frac{1}{n-2}\sum_{i\in\{T,C\}}\sum_{j=1}^{n_{i}}(X_{ij}-\bar{X}_{% i})^{2}

where $n=n_{T}+n_{C}$ is the total numer of patients recruited.

Inference

The random variable $T$ is compared to the $t$ -distribution with $n_{T}+n_{C}-2$ degrees of freedom and inference upon: $p=P(|T|>=t_{n-2,1-\alpha/2}).$

Alternatively, let $S_{T}$ and $S_{C}$ represent the sample standard deviations for the and $n_{T}$ and $n_{C}$ represent the group numbers compute the pooled variance estimate:

\quad S^{2}=\frac{(n_{T}-1)S_{T}^{2}+(n_{C}-1)S^{2}_{C}}{n_{T}+n_{C}-2}

and as previous the standard error is given by:

SE(\hat{D})=S\sqrt{\left(\frac{1}{n_{T}}+\frac{1}{n_{C}}\right)}.

Assumptions: underpinning the t-test?

Non-parametric alternative $\rightarrow$ Mann-Whitney test.

An approximate test assuming non-constant variances can be performed (Welch’s) similarly with

T=\frac{\hat{D}=\bar{X}_{T}-\bar{X}_{C}}{\sqrt{\frac{S_{T}^{2}}{n_{T}}+\frac{S% _{C}^{2}}{n_{C}}}}

and an adjustment to the degrees of freedom.

The assumptions underlying the two-sample t-test are:

•

the data are interval scale
•

the data are independent
•

the sampling distributions are normal
•

a common underlying variance.

Comments:

Exploratory analyses can be used to investigate the adequacy of the underlying assumptions.

Transformation can be performed to yield approximate normality, for example, a logarithmic transform for positively skewed data.

The t-test is robust to symmetric departures from Normality. If in doubt a non-paramteric test can be used.

4.4 Relaxing the assumption of a common variance

An approximate test assuming non-constant variances can be performed (Welch’s) similarly with

T=\frac{\hat{D}=\bar{X}_{T}-\bar{X}_{C}}{\sqrt{\frac{S_{T}^{2}}{n_{T}}+\frac{S% _{C}^{2}}{n_{C}}}}

and an adjustment to the degrees of freedom.

Using R to perform the t-test is straight forward using the following code.

Standard test,

> test1=t.test(y,x, var.equal=T)

Welch’s test,

> test2=t.test(y,x, var.equal=F)

4.5 Point estimation and confidence interval

Direct testing procedures (e.g T-tests) yield p-values but do not allow for assessment of effect sizes. In practice an estimate of the difference $D$ and corresponding confidence interval is preferable.

Parameter of interest: difference, $D$ , of treatment group expectations.

Estimator: observed difference of group means

\hat{D}=\bar{X}_{T}-\bar{X}_{C}.

95% confidence interval:

\hat{D}\pm t_{n-2,1-\alpha/2}\ S\ \sqrt{\frac{1}{n_{1}}+\frac{1}{n_{2}}}\quad

with

\quad S^{2}=\frac{1}{n-2}\sum_{i\in\{T,C\}}\sum_{j=1}^{n_{i}}(X_{ij}-\bar{X}_{% i})^{2}.

•

inference: Does the confidence interval for the true difference, $D=\mu_{T}-\mu_{C}$ , span zero?
•

interval interpretation: Under repeated sampling we would expect $95\%$ of such intervals to contain the true parameter value.
•

intervals are more informative: can assess plausible range of effect size.
•

statistical significance versus clinical relevance: a significant p-value does not infer clinical relevance.

Good practice: CONSORT statement.
See: Exercises on Moodle.

4.6 Utilising ‘baseline’ measurements

Patient responses will often vary according to their respective values at baseline (entry to trial).

A proper baseline measurement is a measure of the end-point of interest taken prior to randomisation.

Two common approaches are to:

•

base estimation on the differences from baseline $y_{i}=y_{i1}-y_{i0},i=1,\dots,n$ , comparing group mean change
•

analysis of covariance (ANCOVA) $\rightarrow$ regression model including baseline as a covariate.

Baseline measurements can increase precision!

ANCOVA is the preferred approach: method ‘adjusts’ the estimate for baseline imbalance and regression to the mean

•

example: Blood pressure after 12 weeks of treatment or difference from baseline
•

For a detailed discussion of the relative merits see Senn S, 1999
•

See Vickers, A. and Altman, D., Statitics notes: Analysing controlled trials with baseline and follow-up measurements. BMJ, 2001, 1123-112.

Significance testing of baselines is not appropriate.

4.7 Paired t-test

For paired designs (for example, a patients left eye and right eye) a one sample t-test is performed by

1.

averaging the within-pair differences:

$\bar{D}=\frac{\sum_{i}^{n}d_{i}}{n}$

where $d_{i}=y_{i1}-y_{i2},i=,1\dots,n.$
2.

Computing the standard deviation of the differences $S_{d}$ .
3.

Calculating the standard error of the mean of the differences $se(\bar{D})=\frac{S_{d}}{\sqrt{n}}$ .
4.

Computing the test statistic:

$T=\frac{\bar{D}}{se(\bar{D})}.$

under $H_{0}:\bar{D}=0$ versus $H_{1}:\bar{D}=\Delta\not=0.$
5.

Comparing to a t-distribution on $n-1$ degrees of freedom.

Using R is again straight forward using:

> test3=t.test(y,mu=0)

where $y$ is the observed vector of within-pair differences $d_{i}$ .

Assumptions for the paired design?

Note for later: a paired t-test is used to perfom a ‘simple analysis’ of cross-over trial data.

4.8 Binary response data

Responses may be dichotomous, for example, cancer free at five years/cancer recurred, died/survived, diseased/not disease.

Clinically interest then lies in comparing the treatment group proportions, $p_{1},p_{2}$ , say.

Consider the following general tabular representation of results

		Treatment Group
		1	2	Total
Outcome	Yes	a	b	a+b
observed	No	c	d	c+d
	Total	a+c	b+d	n

4.9 Treatment effects for binary endpoints

Let $p_{1}$ and $p_{2}$ denote the respective risks (success probabilities) for the treatment and control groups.

Risk Difference

the risk difference, $p_{1}-p_{2}$ , is estimated:

\displaystyle RD=\frac{a}{a+c}-\frac{b}{b+d}.

Relative Risk

the relative risk, $p_{1}/p_{2}$ , is estimated:

\displaystyle RR=\frac{a/(a+c)}{b/(b+d)}.

Odds Ratio

The disease odds, $p/(1-p)$ , provides the ratio of success to failure. The odds ratio comparing groups is estimated:

\displaystyle OR=\frac{p_{1}}{1-p_{1}}\div\frac{p_{2}}{1-p_{2}}=\frac{ad}{bc}.

4.10 Differences of RD, RR and OR

Aim: association between hay fever and eczema in 11 year old children.

		Yes	No	Total
		Hay fever
Eczema	Yes	141	420	561
	No	928	13 525	14 453
	Total	1069	13 945	15 522

event: Eczema=“Yes”

•

$\displaystyle RD=\frac{141}{1069}-\frac{420}{13945}=0.102$
•

$\displaystyle RR=\frac{141/1069}{420/13945}=4.38$
•

$\displaystyle OR=\frac{141\cdot 13525}{928\cdot 420}=4.89$

event: Eczema=“No” (swap rows)

•

$\displaystyle RD=\frac{928}{1069}-\frac{13525}{13945}=-0.102$
•

$\displaystyle RR=\frac{928/1069}{13525/13945}=0.895\neq\frac{1}{4.38}$ !
•

$\displaystyle OR=\frac{928\cdot 420}{141\cdot 13525}=\frac{1}{4.89}=0.2$

Look at the table the other way around:

•

risk difference: $\displaystyle RD=\frac{141}{561}-\frac{928}{14453}=0.189$
•

relative risk: $\displaystyle RR=\frac{141/561}{928/14453}=3.91$
•

odds ratio: $\displaystyle OR=\frac{141\cdot 13525}{928\cdot 420}=4.89$

$O R$ the same whichever way round one looks at table!

4.11 Binary data confidence intervals

Intervals based upon the Normal approximation $1-\alpha$ CI: $\displaystyle\hat{\theta}\pm z_{1-\alpha/2}SE(\hat{\theta}).$

Risk Difference: $\displaystyle SE(RD)=\sqrt{\frac{p_{1}(1-p_{1})}{n_{1}}+\frac{p_{2}(1-p_{2})}{% n_{2}}}$ with $\displaystyle p_{1}=\frac{a}{a+c}$ , $\displaystyle p_{2}=\frac{b}{b+d}$ , $n_{1}=a+c$ and $n_{2}=b+d.$
Relative Risk: $\displaystyle SE(\log RR)=\sqrt{\frac{1}{a}-\frac{1}{a+c}+\frac{1}{b}-\frac{1}% {b+d}}$ .
Odds Ratio: $\displaystyle SE(\log OR)=\sqrt{\frac{1}{a}+\frac{1}{b}+\frac{1}{c}+\frac{1}{d}}$ .

4.12 Example constructing confidence intervals for the odds ratio

Aim is to compare the odds of eczema amongst hayfever sufferers to non-sufferers.

Compute the odds ratio:

OR=4.89

Compute the standard error of the log of the odds ratio and then compute a $95\%$ confidence interval for the log odds ratio and then back-transform:

SE(\log OR)=\sqrt{\frac{1}{141}+\frac{1}{420}+\frac{1}{928}+\frac{1}{13525}}=0% .103

95% CI: $\log(4.89)\pm 1.96\cdot 0.103$ $\rightarrow$ antilog: $[4.0;5.99]$ (asymmetric).

Further examples on exercise sheet.

4.13 Remarks on Odds Ratios

Case-control studies: $R R$ should not be used! (subj. selection depends on outcome).

Logistic regression: adjustment for covariates.

Odds ratios do not contain information as to absolute risk difference.

If outcome rare, then $OR\approx RR$ ( $a/c\approx a/(a+c)$ ).

Further reading: Bland JM, Altman DG (2000) BMJ 320: 1468.