4 Defining and Estimating Treatment Effects

4.1 Treatment effects

[allowframebreaks]

When performing statistical analysis of clinical trial data we need to consider (amongst other things):

  • the outcome (endpoint) type and comparison of interest (e.g difference in treatment group means/group proportions; ratio of treatment group means)

  • distributional assumptions regarding the outcome measurements/parameters of interest (e.g Normal/Bernoulli/Poisson etc.);

  • the structure in the data imposed by the design (e.g paired designs, cross-over designs, randomised block designs etc);

  • Estimating versus testing issues (clinical significance versus relevance);

  • The analysis set (ITT/per protocol);

  • Analyses specified a priori (i.e. justified by design and specified in the protocol) and those which are exploratory.

4.2 Two group comparisons: continuous response data

[allowframebreaks]

Continuous measures on trial participants are typically summarised in term of group means/medians, for example, mean diastolic blood pressure/mean height by treatment group.

Clinically the outcome of interest is the difference, D, between the groups: the ‘treatment effect’.

Not only are we interested in the estimated magnitude of the difference, D^, we are also interested in the degree of precision of the estimate as measured by its standard error, se(D^), or a confidence interval.

A p-value quantifying the play of chance under the null may be interesting but it is the size of the difference which is of interest in terms of clinical relevance.

A study may continue if D^ is sufficiently large even if it is not significantly different from zero.

Larger samples increased precision.

4.3 Two independent group means

[allowframebreaks]

Direct testing approach: two-sample t-test.

Let μT denote the treatment group mean and μC the control group mean.

Research Hypothesis

H0:D=μT-μC=0 versus H1:D=Δ0.
Test Statistic computed under H0
T=D^=X¯T-X¯CSE(D^).

Assuming a common underlying variance the corresponding standard error is estimated by pooling the data :

SE(D^)=S1nT+1nC

with

S2=1n-2i{T,C}j=1ni(Xij-X¯i)2

where n=nT+nC is the total numer of patients recruited.

Inference

The random variable T is compared to the t-distribution with nT+nC-2 degrees of freedom and inference upon: p=P(|T|tn-2,1-α/2).

Alternatively, let ST and SC represent the sample standard deviations for the and nT and nC represent the group numbers compute the pooled variance estimate:

S2=(nT-1)ST2+(nC-1)SC2nT+nC-2

and as previous the standard error is given by:

SE(D^)=S(1nT+1nC).

Assumptions: underpinning the t-test?

Non-parametric alternative Mann-Whitney test.

An approximate test assuming non-constant variances can be performed (Welch’s) similarly with

T=D^=X¯T-X¯CST2nT+SC2nC

and an adjustment to the degrees of freedom.

The assumptions underlying the two-sample t-test are:

  • the data are interval scale

  • the data are independent

  • the sampling distributions are normal

  • a common underlying variance.

Comments:

Exploratory analyses can be used to investigate the adequacy of the underlying assumptions.

Transformation can be performed to yield approximate normality, for example, a logarithmic transform for positively skewed data.

The t-test is robust to symmetric departures from Normality. If in doubt a non-paramteric test can be used.

4.4 Relaxing the assumption of a common variance

[allowframebreaks,fragile]

An approximate test assuming non-constant variances can be performed (Welch’s) similarly with

T=D^=X¯T-X¯CST2nT+SC2nC

and an adjustment to the degrees of freedom.

Using R to perform the t-test is straight forward using the following code.

Standard test,

> test1=t.test(y,x, var.equal=T)

Welch’s test,

> test2=t.test(y,x, var.equal=F)

4.5 Point estimation and confidence interval

[allowframebreaks]

Direct testing procedures (e.g T-tests) yield p-values but do not allow for assessment of effect sizes. In practice an estimate of the difference D and corresponding confidence interval is preferable.

Parameter of interest: difference, D, of treatment group expectations.

Estimator: observed difference of group means

D^=X¯T-X¯C.

95% confidence interval:

D^±tn-2,1-α/2S1n1+1n2

with

S2=1n-2i{T,C}j=1ni(Xij-X¯i)2.
  • inference: Does the confidence interval for the true difference, D=μT-μC, span zero?

  • interval interpretation: Under repeated sampling we would expect 95% of such intervals to contain the true parameter value.

  • intervals are more informative: can assess plausible range of effect size.

  • statistical significance versus clinical relevance: a significant p-value does not infer clinical relevance.

Good practice: CONSORT statement.
See: Exercises on Moodle.

4.6 Utilising ‘baseline’ measurements

[allowframebreaks]

Patient responses will often vary according to their respective values at baseline (entry to trial).

A proper baseline measurement is a measure of the end-point of interest taken prior to randomisation.

Two common approaches are to:

  • base estimation on the differences from baseline yi=yi1-yi0,i=1,,n, comparing group mean change

  • analysis of covariance (ANCOVA) regression model including baseline as a covariate.

Baseline measurements can increase precision!

ANCOVA is the preferred approach: method ‘adjusts’ the estimate for baseline imbalance and regression to the mean

  • example: Blood pressure after 12 weeks of treatment or difference from baseline

  • For a detailed discussion of the relative merits see Senn S, 1999

  • See Vickers, A. and Altman, D., Statitics notes: Analysing controlled trials with baseline and follow-up measurements. BMJ, 2001, 1123-112.

Significance testing of baselines is not appropriate.

4.7 Paired t-test

[allowframebreaks,fragile]

For paired designs (for example, a patients left eye and right eye) a one sample t-test is performed by

  1. 1.

    averaging the within-pair differences:

    D¯=indin

    where di=yi1-yi2,i=,1,n.

  2. 2.

    Computing the standard deviation of the differences Sd.

  3. 3.

    Calculating the standard error of the mean of the differences se(D¯)=Sdn.

  4. 4.

    Computing the test statistic:

    T=D¯se(D¯).

    under H0:D¯=0 versus H1:D¯=Δ0.

  5. 5.

    Comparing to a t-distribution on n-1 degrees of freedom.

Using R is again straight forward using:

> test3=t.test(y,mu=0)

where y is the observed vector of within-pair differences di.

Assumptions for the paired design?

Note for later: a paired t-test is used to perfom a ‘simple analysis’ of cross-over trial data.

4.8 Binary response data

[allowframebreaks]

Responses may be dichotomous, for example, cancer free at five years/cancer recurred, died/survived, diseased/not disease.

Clinically interest then lies in comparing the treatment group proportions, p1,p2, say.

Consider the following general tabular representation of results

Treatment Group
1 2 Total
Outcome Yes a b a+b
observed No c d c+d
Total a+c b+d n

4.9 Treatment effects for binary endpoints

[allowframebreaks]

Let p1 and p2 denote the respective risks (success probabilities) for the treatment and control groups.

Risk Difference

the risk difference, p1-p2, is estimated:

RD=aa+c-bb+d.
Relative Risk

the relative risk, p1/p2, is estimated:

RR=a/(a+c)b/(b+d).
Odds Ratio

The disease odds, p/(1-p), provides the ratio of success to failure. The odds ratio comparing groups is estimated:

OR=p11-p1÷p21-p2=adbc.

4.10 Differences of RD, RR and OR

[allowframebreaks]

Aim: association between hay fever and eczema in 11 year old children.

Hay fever
Yes No Total
Eczema Yes 141 420 561
No 928 13 525 14 453
Total 1069 13 945 15 522

event: Eczema=“Yes”

  • RD=1411069-42013945=0.102

  • RR=141/1069420/13945=4.38

  • OR=14113525928420=4.89

event: Eczema=“No” (swap rows)

  • RD=9281069-1352513945=-0.102

  • RR=928/106913525/13945=0.89514.38!

  • OR=92842014113525=14.89=0.2

Look at the table the other way around:

  • risk difference: RD=141561-92814453=0.189

  • relative risk: RR=141/561928/14453=3.91

  • odds ratio: OR=14113525928420=4.89

OR the same whichever way round one looks at table!

4.11 Binary data confidence intervals

[allowframebreaks]

Intervals based upon the Normal approximation 1-α CI: θ^±z1-α/2SE(θ^).

Risk Difference

SE(RD)=p1(1-p1)n1+p2(1-p2)n2 with p1=aa+c, p2=bb+d, n1=a+c and n2=b+d.

Relative Risk

SE(logRR)=1a-1a+c+1b-1b+d.

Odds Ratio

SE(logOR)=1a+1b+1c+1d.

4.12 Example constructing confidence intervals for the odds ratio

[allowframebreaks]

Aim is to compare the odds of eczema amongst hayfever sufferers to non-sufferers.

Compute the odds ratio:

OR=4.89

Compute the standard error of the log of the odds ratio and then compute a 95% confidence interval for the log odds ratio and then back-transform:

SE(logOR)=1141+1420+1928+113525=0.103

95% CI: log(4.89)±1.960.103 antilog: [4.0;5.99] (asymmetric).

Further examples on exercise sheet.

4.13 Remarks on Odds Ratios

[allowframebreaks]

Case-control studies: RR should not be used! (subj. selection depends on outcome).

Logistic regression: adjustment for covariates.

Odds ratios do not contain information as to absolute risk difference.

If outcome rare, then ORRR (a/ca/(a+c)).

Further reading: Bland JM, Altman DG (2000) BMJ 320: 1468.