1 Clinical Trials

1.4 Comparing Two Groups: Continuous Response Data

Continuous endpoints (e.g diastolic blood pressure, height, weight, serum cholesterol level) are typically summarised in terms of group means or medians

Clinically the outcome of interest is often the difference, D, between the groups as opposed to the actual values

We are interested in the estimated size of the difference, D^, but also in the degree of precision of the estimate as measured by its standard error, SE(D^), or constructing a confidence interval

A p-value quantifying the play of chance under the null may be interesting but it is the size of the difference which is of interest in terms of clinical relevance

A study may continue if D^ is sufficiently large even if it is not significantly different from zero

Statistical significance does not imply a clinically relevant difference!

Larger samples increased precision

Comparing two independent group means

Students’ t-test: direct testing procedure

Let μT denote the treatment group mean and μC the control group mean. We have:

  • Research Hypothesis:

    H0:D=μT-μC=0 versus H1:D=Δ0.

  • Estimate: D^=X¯T-X¯C

  • Test Statistic computed under H0:

    T=D^=X¯T-X¯C-0SE(D^).
  • Assuming a common underlying variance, σ2, the standard error is estimated by pooling the data :

    SE(D^)=S1nT+1nC

    with

    S2=1n-2i{T,C}j=1ni(Xij-X¯i)2

    where n=nT+nC is the total number of patients recruited

  • Inference: The random variable T is compared to the t-distribution with nT+nC-2 degrees of freedom and inference is based upon whether |T|tn-2,1-α/2

Two independent group means continued.

The, perhaps, more familiar form of the pooled variance estimate is given below.

Let ST and SC represent the sample standard deviations for the and nT and nC represent the group numbers compute the pooled variance estimate:

S2=(nT-1)ST2+(nC-1)SC2nT+nC-2

and as previously the standard error is then given by:

SE(D^)=(1nT+1nC)

Assumptions: underpinning this t-test?

An approximate test assuming non-constant variances can be performed (Welch’s) similarly with

T=D^=X¯T-X¯CST2nT+SC2nC

and adjustment to the degrees of freedom.

Non-parametric alternative? Mann-Whitney test

Point estimate and corresponding confidence interval

Hypothesis tests yield p-values but do not allow for direct assessment/quantification of effect sizes an estimate of the difference and a corresponding confidence interval is preferable

Let D denote the true difference in treatment group means: D=μT-μC

  • estimate: observed difference in group means

    D^=X¯T-X¯C
  • corresponding 95% confidence interval:

    D^±tn-2,1-α/2S1n1+1n2

    with

    S2=1n-2i{T,C}j=1ni(Xij-X¯i)2

    as previously

Point estimate and corresponding confidence interval (continued)

  • inference: Does the (1-α)% confidence interval for the true difference, D=μT-μC, span zero?

  • often α=0.05 is specified

  • interval interpretation: Under repeated sampling we would expect (1-α)% of such constructed intervals to contain the true parameter value

  • intervals are more informative: one can assess the plausible range for the effect size

  • statistical significance versus clinical relevance: a significant p-value does not imply clinical relevance

  • good reporting practice: CONSORT statement recommends reporting of both estimates and confidence intervals

Examining Treatment Differences in ’Paired’ Samples (paired t-test)

For paired designs (for example, a patients left eye and right eye, twins etc) a one sample t-test is performed based upon the observed within-pair differences, di=yi1-yi2,i=,1,n. The procedure:

1. Compute the mean of the within-pair differences:

D¯=indin

2. Compute the standard deviation of the differences Sd.

3. Calculate the standard error of the mean of the differences SE(D¯)=Sdn.

4. Compute the test statistic under H0:

T=D¯-0SE(D¯).

H0:D=0 versus H1:D=Δ0.

5. Compare to a t-distribution on n-1 degrees of freedom.

6. Preferable approach: compute a (1-α)% confidence interval for the true difference:

D¯±tn-1,1-α/2SE(D¯)

1. 5 Comparing two groups: binary response data

  • Responses may be dichotomous, binary, for example, cancer free at five years/cancer recurred withing five years, died/survived, diseased/not diseased.

  • Clinically interest then lies in comparing the treatment group proportions, p1 and p2, say.

  • Consider the following general two-by-two tabular representation of results Treatment Group 1 2 Total Outcome Yes a b a+b observed No c d c+d Total a+c b+d n

Definitions of treatment effects

:

Let p1 and p2 denote the respective risks (success probabilities) for the treatment and control groups

  • the risk difference, p1-p2, is estimated:

    RD=aa+c-bb+d
  • the relative risk, p1/p2, is estimated:

    RR=a/(a+c)b/(b+d).
  • The disease odds, pi/(1-pi), provides the ratio of success to failure. The odds ratio comparing groups is estimated:

    OR=p11-p1÷p21-p2=adbc

Risk difference, Relative Risk and Odds Ratio
– Example –

  • Study conducted to investigate association between hay fever and eczema in 11 year old children. Findings presented in tabular form:

    Hay fever
    Yes No Total
    Eczema Yes 141 420 561
    No 928 13 525 14 453
    Total 1069 13 945 15 522

    Event: Eczema=“Yes”

    • RD=1411069-42013945=0.102

    • RR=141/1069420/13945=4.38

    • OR=14113525928420=4.89

  • What happens if we consider the event to be ’No Eczema?

  • What happens if we consider the table the other way round and think of ’Hayfever’ as the event for the two eczema groups?

Estimating Treatment Effects: Binary Data
Confidence Intervals

  • Intervals are constructed based upon the Normal approximation (1-α)% CI: θ^±z1-α/2SE(θ^)

  • risk difference:

    SE(RD)=p1(1-p1)n1+p2(1-p2)n2

    with p1=aa+c, p2=bb+d, n1=a+c and n2=b+d

  • relative risk:

    SE(loge(RR))=1a-1a+c+1b-1b+d
  • odds ratio:

    SE(loge(OR))=1a+1b+1c+1d

Example: Constructing a confidence interval for the odds ratio

  • Aim: to compare the odds of eczema amongst hayfever sufferers to that of non-sufferers

  • Compute the odds ratio: OR=4.89

  • Compute the standard error of the natural logarithm of the odds ratio, then compute a 95% confidence interval for the logarithm of the odds ratio and then back-transform:

    SE(loge(OR))=1141+1420+1928+113525=0.103
  • 95% CI: loge(4.89)±1.960.103 antilog: [4.0;5.99] (asymmetric on the odds ratio scale)

  • Further examples on exercise sheet

  • More details on the utility of the ratio of odds will feature later in the course when we consider observational studies

  • further reading: Bland JM, Altman DG (2000) BMJ 320: 1468.

1.6 Cross-over Trials in Clinical Research

So far in the module we have considered two study designs:

  • a parallel group design: different groups of patients are studied concurrently (in parallel). Patients receive a single therapy (or combination of therapies) estimate of treatment effect is based upon so-called ’between-subject’ comparisons. We used the two independent samples t-test for inference.

  • a paired design: patients receive both treatment for example, matching parts of anatomy (e.g. limbs, eyes, kin etc) estimate of treatment effect is based upon a ’within-subject’ comparison. We used a paired t-test (one sample t-test on the within subject differences). We noted asymmetry can be problematic!

An alternative design building upon the idea that a participant acts as their own control is the:

  • crossover design: patients receive a sequence of treatments; the order determined by randomisation estimate of treatment effect based upon ’within-subject’ comparisons.

Cross-Over Trials

  • Definition (Senn, 1993)

    “A cross-over trial is one in which subjects are given sequences of treatments with the object of studying differences between individual treatments (or sub-sequences of treatments).”

  • Randomisation: the order of the treatments is assigned at random

  • The times when treatments are administered are called treatment periods, or simply periods

Simple, example (2 period, 2 treatments)

Sequence Period 1 Period 2
Group 1 A B
Group 2 B A

Advantages of the cross-over design are:

  • within-subject comparisons: patients act as their own control elimination of between-patient variation

  • sample size is smaller: same number of observations with fewer patients

  • precision increased: can achieve the same degree precision in estimation with fewer observations

  • Further reading (Senn 1993, Sec. 1.3)

Disadvantages of Cross-Over Trials

Disadvantages / issues relating to the use of a cross-over design are:

  • inconvenience to patients: several treatments, longer total time under observation (sometimes advantage!)

  • drop outs: patients may withdraw

  • they only suitable for certain indications

  • period by treatment interaction: the treatment effect is not constant over time

  • carry-over effect: “Carry-over is the persistence […] of a treatment applied in one period in a subsequent period of treatment.”

  • analysis is more complex: pairs of measurements may be systematic differences between periods

  • Further reading Senn 1993, Sec. 1.4

What May Be Done About Carry-Over?

Wash-out period:

“A wash-out period is a period in a trial during which the effect of a treatment given previously is believed to disappear. If no treatment is given during the wash-out period then the wash-out is passive. If a treatment is given during the wash-out period than the wash-out is active.”

(Senn, 1993)

When are cross-over trials useful?

  • chronic diseases which are relatively stable (e.g. asthma, rheumatism, migraine, moderate hypertension, epilepsy)

  • single-dose trials of bio-equivalence (PK/PD) rather than long-term trials

  • drugs with rapid, reversible effects rather than ones with persistent effects

Cross-over Trials: the AB/BA Design with Normal Response Data

Various types of cross-over designs exist but we shall focus upon the so-called 2×2 design.

  • two treatment, two period cross-over

  • two sequences: 1) AB and 2) BA

  • also called AB/BA design (more specific)

  • in the following normally distributed endpoint considered

Motivating example, Asthma trial

  • objective: comparing the effects of formoterol (experimental) and salbutamol (standard)

  • patients: 13 children (aged 7 to 14 years) with moderate to severe asthma

  • single-dose trial: 200 μ g subatomic, 12 μg formoterol: bronchodilators

Asthma Example (continued.)

  • primary endpoint

    • peak expiratory flow (PEF, [l/min]): a measure lung function

    • several measurements during the first 12 hours after drug intake

    • measurements after 8 hours considered here

  • drop-outs:

    • NOTE patient 8 dropped out after first period

    • not mentioned by Graff-Lonnevig V, Browaldh L (1990)!

    • See also Senn 1993, Sec. 3.1

Asthma Example (continued)

  • design

    • randomised (randomisation procedure?): order of treatments assigned at random to form sequence groups

    • double-blind: double-dummy technique

    • two treatment, two period cross-over (AB/BA design)

    • wash-out period of at least one day

Seq. Period 1 Wash-Out Period 2
F/S formoterol no treatment salbutamol
S/F salbutamol no treatment formoterol

Data of the Asthma Trial

Unnumbered Figure: Link

Unnumbered Figure: Link

A Simple Analysis: Ignoring the Effect of Period
(Senn 1993, Sec.3.2, 3.3)

  • If no period effect then one can proceed as per the paired design considered previously using a paired t-test

  • method

    • calculate the treatment differences di=yiF-yiS (response on formoterol minus response on salbutamol) for each subject, i=1,,n

    • calculate the mean of the differences d¯ and SE(d¯)

    • perform a one-sample t-test for the differences (i.e. a paired t-test)

    • construct a confidence interval for the true difference

  • assumptions underlying the use of the paired test

    • normally distributed differences

    • unbiased:  E(di) = = true treatment effect, τ, say

Example: Simple Analysis for Asthma Trial: ignoring period effect

  • mean of the differences: d¯=45.4,
    standard deviation of the differences: σ^d=40.6,
    degrees of freedom (df): n-1=12

  • test statistic

    t=nd¯σ^d=1345.440.6=4.0
  • 95% confidence interval for the true difference

    [d¯-tn-1,1-α/2σ^d/n;d¯+tn-1,1-α/2σ^d/n]
    =[45.4-2.211.3;45.4+2.211.3]=[21;70]
  • p-value: p=0.0017

  • Conclusion/comments?

A Simple Analysis Ignoring the Effect of Period: (continued.)

  • ‘factors that might cause the differences not to be distributed at random about the true treatment effect”

    • period effect (e.g. hay fever: pollen count differs; learning effects etc.)

    • period by treatment interaction

    • carry-over

    • patient by treatment interaction: cannot be investigated in AB/BA design

    • patient by period interaction

    (Senn, 1993)

A cell means model: expected values in the AB/BA Cross-Over with Period Effect

  • Let μ denote the expectation for treatment B

  • τ denote the treatment effect (treatment A - treatment B)

  • π denote period effect (period 2 - period 1)

We can express the expected values for the AB/BA design in the cells of a 2×2 table: Sequence Period 1 Period 2 AB μ+τ μ+π BA μ μ+τ+π

Estimating Treatment Effects Using Sequence Group Period Differences

How can we yield an unbiased estimate of the treatment effect in the presence of the period effect?

Recall: θ^ is unbiased estimator of θ if: E(θ^)=θ

1. Consider the expectation of the mean of the period differences (period 1 - period 2):

E(d¯i),i=1,2,

for each sequence group (1:(AB), 2:(BA)).

E(d¯1)=τ-π,E(d¯2)=-τ-π

2. subtracting the expected period differences:

E(d¯1)-E(d¯2)=(τ-π)-(-τ-π)=2τ

3. and dividing by 2 to yield τ

Hence the estimator

τ^=d1¯-d2¯2

is unbiased for τ

Estimating the Period Effect Using the Sequence Group Period Differences

How can we yield an unbiased estimate of the period effect?

1. Consider again the expectation of the mean of the period differences for each sequence group:

E(d¯1)=τ-π,E(d¯2)=-τ-π

2. summing the expectations for the two groups gives

E(d¯1)+E(d¯2)=(τ-π)+(-τ-π)=-2π

3. and dividing by -2 yields π

Hence the estimator

π^=d1¯+d2¯-2

is unbiased for π

Adjusting the Estimated Treatment Effect for the Effect of Period

Method:

  • calculate the period differences, dij=yij1-yij2,
    (period 1 - period 2) for each individual

  • calculate the means d¯i and standard deviations, Si for the two sequence groups i=1,2

  • estimate the treatment effect: τ^=(d¯1-d¯2)/2

  • compute the test statistic under H0:τ=0

    T=τ^σ^d¯tn-2

    with

    σ^d¯=14(1n1+1n2)S2

    and

    S2=((n1-1)S12+(n2-1)S22)/(n-2)
  • construct a (1-α)% confidence interval for τ

    [τ^-tn-2,1-α/2σ^d¯;τ^+tn-2,1-α/2σ^d¯]

Adjusting for a Period Effect in the Asthma Trial

  • The table below gives the means and standard deviations of the period differences for the two sequence groups sequence n d¯i si for/sal 7 30.7 33.0 sal/for 6 -62.5 44.7

  • test statistic T=46.6/10.8=4.3

  • 95% confidence interval for the treatment effect: [46.6±2.210.8]=[23;70]

  • p-value p=0.001

  • Comments/conclusion?

  • How do the period-adjusted results compare with the simple analysis results?

Estimating the Period Effect using Period Differences?

  • Add the sequence group means di¯ (as opposed to subtracting them) and then divide by -2

  • Note the form of the standard error is the same for the treatment and period effect: why?

  • Exercise: estimate the period effect and construct a 95% confidence interval for the period effect.

Remarks on Carry-over

Fixed Effects in the AB/BA Cross-Over

Sequence Period 1 Period 2
AB μ+τ μ+π+λ1
BA μ μ+τ+π+λ2
  • Let λ1 and λ2 denote the expected carry-over effects (with μ, τ and π defined as previous)

  • How can you use the cell means model to yield an estimate carry over effects?

  • only the difference between λ1 and λ2 identifiable

  • The estimate is based upon differences between expected sequence groups totals

Remarks on Testing for Carry-over

  • testing for carry-over?

    • estimate is based upon ’between-patient’ variation low power of test

    • the carry over effect is confounded with period-treatment interaction in AB/BA design

    • two-stage procedure biased estimator of treatment effect

  • do not test for carry-over !!!

  • conclusion (Senn 1993, p 69)

    “No help regarding this problem is to be expected from the data. The solution lies entirely in design.”

  • further reading: Senn (1993), Senn (1997)

References and Further Reading

:

  • Senn S (1993) Cross-over trials in clinical research. Wiley, Chichester.

  • Senn S (1997) Statistical issues in drug development. Wiley, Chichester.

  • Jones B, Kenward MG (1990) Design and analysis of cross-over trials. Chapman & Hall, London.

  • Senn S et al. An incomplete blocks cross-over in asthma. In: Vollmar J, Hothorn LA (eds). Cross-over clinical trials. Gustav Fischer Verlag, Stuttgart.

1.7 Introduction to ’Sample Size’ Determination

  • Sample size by definition: “number of subjects in a clinical trial”

  • Why adequate sample sizes?

    • ethics

    • budget constraints

    • time constraints

  • The trial should be sufficiently large to provide a reliable answer to the research question

  • Usually based upon the primary endpoint. Usually an efficacy measure as opposed to safety / tolerability endpoint

  • Guidelines:

    • ICH E9 - Statistical Principles for Clinical Trials (Section 3.5: Sample Size)

Hypothesis Tests, Error Rates and Power

Classical hypothesis testing use p-values to determine which of two competing hypotheses to draw from available data: H0 versus H1, say.

p-value: the probability, p, of obtaining a test result at least as extreme as that observed, assuming that the null hypothesis H0 is true

The so-called size of test is given by the value α which is typically chosen: α=0.05.

If p<α we reject H0 and conclude data inconsistent with null

Errors in testing: methods are based upon experimental data and hence carry some risk of drawing a false conclusion

Truth
H0 true H1 true
Decision Fail to reject H0 No Error Error II
made Reject H0 Error I No Error

Hypothesis Tests and Error Rates

Unnumbered Figure: Link

  • Type I error rate: α=P(rejectH0H0true)

  • Type II error rate:

    β=P(fail to rejectH0H1true)
  • Critical value: tkrit

  • Power =1-β=P(rejectH0H1true)

Power of a Test

Definition of power: “’probability of concluding that the alternative hypothesis is true given that it is in fact, true….” (Senn, 1997)

Power depends upon:

  • statistical test being used

  • the size of that test α

  • the nature and variability of the observations made σ2

  • the alternative hypothesis (e.g the size of difference Δ)

Note that a priori we do not know the size of the difference between treatments usually the alternate hypothesis is based upon a clinically relevant difference, Δ*, say

Δ* is a difference we would like to detect with reasonable power

Basic Principle of Power Calculation: Normally Distributed Test Statistic

  • consider (approx) normally distributed test statistic ZN(ϑ,1)

  • ϑ=0 under H0 and ϑ=ϑ under H1

  • set power equal to target value:

    power=P(ZΦ-1(1-α/2)H1true)=1-β
    1-Φ(Φ-1(1-α/2)-ϑ)1-β
    ϑ=Φ-1(1-α/2)+Φ-1(1-β)
  • note ϑ=ϑ(n), solve equation for n

  • Let’s now consider a specific test

One-Sample Gauss Test (one-sided)

  • Assume data: XiN(Δ,σ2), i=1,,n iid with known variance σ2

  • hypotheses: H0:Δ=0 vs. H1:Δ>0

  • test statistic: Z=nX¯σ with X¯=1ni=1nXi

  • critical value: Φ-1(1-α)=z1-α

  • desired power: if Δ=Δ (smallest clinically relevant difference) is 1-β

  • set the probability for rejection of the null hypothesis to 1-β:

    power=P(ZΦ-1(1-α))Δ=Δ*)=1-β
  • Under H1: ZN(nΔσ,1)

  • non-centrality parameter: ϑ=nΔσ

  • sample size:

    n=(Φ-1(1-α)+Φ-1(1-β))2Δ2σ2
  • For a two sided test we substitute Φ-1(1-α/2) in above (more on this later)

Two-Sample Gauss Test (one sided)

  • Assume data: XijN(μi,σ2), i=1,2, j=1,,ni with n1=rn and n2=(1-r)n iid.

  • Denote the true difference in treatment effects Δ=μ1-μ2.

  • hypotheses: H0:Δ0 vs. H1:Δ=Δ*>0

  • test statistic: T=r(1-r)nX¯1-X¯2σ

  • variance σ2 known

  • Under H1: TN(r(1-r)nΔσ,1)

  • Non-centrality parameter?

  • Exercises this week: consider the form of the non-centrality parameter and derive the sample size formula for the two group Gauss test.

  • Note the required sample size formula is:

    n=1r(1-r)(Φ-1(1-α)+Φ-1(1-β))2Δ2σ2
  • For the two-sided test we substitute Φ-1(1-α/2)

Approximate Sample Size for Two-Sample t-Test

  • Assume data and hypotheses as for 2-sample Gauss test, but unknown variance

  • test statistic:

    T=r(1-r)nX¯1-X¯2S

    with

    S2=1n-2i=12j=1ni(Xij-X¯i)2
  • non-centrality parameter: ϑ=r(1-r)nΔσ

  • approximate sample size is:

    n=1r(1-r)(Φ-1(1-α)+Φ-1(1-β))2Δ2σ2
  • exact sample size is based upon non-central t-distribution: Tt(n-2,ϑ) with ϑ=r(1-r)nΔσ

  • power equation cannot be solved for n explicitly

  • we will use RStudio to compute exact sample size

2. Introduction to Epidemiology

  • Epidemiology is the study of the distribution and determinants of disease in human populations.”

  • aim: “to inform health professionals and the public at large in order for improvements in general health status to be made” ( Woodward, 1999)

  • Studies of distribution are largely descriptive:

    • examples include distributions by: geography, time, age, gender, social class, ethnicity and occupation

    • information is obtained regarding disease frequency in populations/sub-populations

    • descriptive studies an be used to generate research hypothesis and resource allocation etc.

  • Disease determinants are the factors that precipitate disease (aetiological/ causal agents)

    • examples: biological (cholesterol/blood pressure), environmental (atmospheric pollutants), social/behavioural (smoking and diet)

    • (potential) aetiological agents are referred to as risk factors

    • studies of determinants of disease: analytic epidemiology using individual level disease and exposure data

  • The epidemiological domain includes both observation and experiment, however, experimentation is usually limited for ethical reasons

Historical Example: Lung Cancer and Smoking

  • Following huge increases in the number of lung cancer deaths, research on smoking and lung cancer was conducted in various epidemiological studies and caused huge debate regarding the interpretation of the study results studies found exposure-disease associations but: does smoking cause lung cancer?

  • Sir R. A. Fisher raised the issue of association versus causation that clouds interpretation of observational studies

  • Fisher proposed that the association could be explained by a confounder: a genotype predisposed to both smoking and lung cancer

  • In response Cornfield argued that the existence of such a confounding factor seemed implausible because the magnitude of the measure of association (relative risk between 10-20 for smokers versus non-smokers)

  • This led to pioneering work by Sir Richard Doll and Sir Austin Bradford Hill shortly after the second world war

The British Doctors Study

  • when: initiated in October 1951 by Doll & Hill

  • who: wrote to members of the medical profession in the UK

  • study group: more than 40,000 doctors replied (out of almost 60,000)

  • exposure evaluation: questionnaire about smoking habits (including current smoker, ex-smoker, never smoker)

  • outcome measure: number of subsequent deaths, cause of death (Registrars-General UK)

  • reprinted: BMJ 328:1529-1533 (26 Jun 2004), Doll et al. 328 (7455): 1519

Doll and Hill: The British Doctors Study

Unnumbered Figure: Link

Populations and Samples

Epidemiology involves the collection, analysis and interpretation of data from human populations.

  • Population

    • target population: population we wish to draw inferences for (e.g. all males in Britain)

    • study population: population from which data are collected (e.g. British Doctors Study)

    • generalisability: can we use the study population results to draw accurate conclusions about the target?

  • Choice of study sample

    • generalisability of results (trade of with availability, cost etc)

    • optimal scheme: random sample of target population

    • doctors study: opportunistic sample readily identifiable and likely to be cooperative

Routine data

  • Epidemiological investigations use data from a variety sources

  • Sources: routinely collected data (e.g. vital registrations: birth/death/cancer/infectious disease registers census data, hospital data bases etc.) or based upon data purposely collected by the investigators (retrospectively or prospectively) by surveys, recruitment and follow-up

  • Routinely collected data:

    • vital for monitoring public health (e.g. cancer incidence), health planning (e.g. how to accommodate increasing life-expectancy)

    • may be of limited quality: subject to regional variation (e.g variations in classification, coding etc) and often do not contain the required individual level information

    • vital statistics are gathered by the government: information is available on births. still-births, abortions, deaths, area populations, mortality, migration etc.

    • can be used to draw investigate high-level inferences regarding possible associations between routinely available attributes (area, gender, age and social class) and the rate of incidence or death from a particular disease.

  • Diseases are presently classified by ICD-10: the international classification of disease.

Limitations of routinely collected data

Routinely collected data are useful but have inherent limitations that we should be mindful of:

  • Coverage: morbidity is inherently difficult to define and hence coverage cannot be complete

    • only hospital patients are covered for many illnesses

    • practitioners vary in their reporting of notifiable infectious diseases

    • sickness certification relates mainly to patients who need a certificate for their employers

    • cancer registers may miss cases who never present to hospital

    • diagnostic/operative data are more difficult to capture than administrative data and are hence often omitted

  • Accuracy: diagnosis of cause of death and illness can be incorrect

  • Availability: confidentiality safe-guards may limit data availability. However, providing research can be justified Research Ethics Committee approval can be obtained