2 Second Chapter

2.6 Confounding in epidemiological investigations

  • epidemiological studies: interest often lies in studying the effect of an exposure (E) on the risk of developing disease (D)

  • the study results may reflect the true effect of an exposure on the disease outcome, however, it should always be considered that the findings may in fact be due to an alternative explanation

  • a confounder (C) is a third variable:

    • associated with the exposure of interest

    • independently associated with the risk of disease

  • a confounder explains partially or fully the relationship between E and D

  • a confounder should not lie on the causal pathway between exposure and disease

Consequences of Confounding

  • consequences of confounding:

    • the creation of an apparent relationship (i.e. creates a spurious relationship) between E and D

    • masking of a true relationship between E and D (i.e conceals a true relationship)

    • causes and overestimation or underestimation of a true effect

  • example: study into alcohol consumption and CHD

    • findings: increased level of alcohol consumption associated with increased risk of coronary heart disease (CHD)

    • confounder: smoking is a risk factor for CHD and is associated with alcohol consumption

    • on controlling for smoking there may be no association between alcohol and CHD

      • *

        confounder: age? overestimation of effect

      • *

        confounder: gender? underestimation of effect

    • see handout for illustration of confounding

    Methods to Control Confounding

    • a number of approaches can be applied to control for potential confounders. Methods may be design-based or analysis-based

    • design based approaches:

      • *

        randomisation: random allocation of exposures promotes balance over potential confounders

      • *

        restriction: limits participation in the study to individuals who are similar in relation to the confounder (e.g smokers)

      • *

        matching: selecting controls to be similar to cases in terms of confounders (e.g. age, gender, smoking habits)

    • analysis based approaches:

      • *

        stratification: examine exposure-disease associations within strata (e.g age groups, smoking groups) and estimate pooled estimate of association measure adjusting for confounding effect

      • *

        standardisation: controlling confounding using an external population to adjust for age, gender etc yielding standardised rates

      • *

        multivariate analysis/regression models: include confounding variable in the model (model adjustment)

    Stratification: Mantel-Haenszel Method for the Odds Ratio

    • Stratification: allows the association between exposure and outcome to be examined within strata of the confounding variable

    • assuming the association measures are relatively uniform they may then be pooled to yield an adjusted estimate

    • Mantel-Haenszel methods most widely used

    • history: method first described for use in stratified case-control studies

    • tables: for each of the k strata we can tabulate (using the usual 2×2 format) the results with ni=ai+bi+ci+di strata specific participants
      Diseased Non-diseased Exposed ai bi Not Exposed ci di

    • assumption: common odds ratio ψ across strata

    • idea:

      • *

        weighted average of the strata odds ratios

      • *

        weight equal to precision (inverse of the variance)

    Mantel-Haenszel Method for Combining Odds Ratios (cont.)

    • Mantel-Haenszel estimate of the common odds ratio

      ψ^=i=1kaidi/nii=1kbici/ni
    • standard error for log(ψ^): see for example Robins J et al. (1986) American Journal Epidemiology 124: 719–723

    Example for Stratification: Housing tenure and CHD

    • (Woodward 1999, Example 4.9)

    • Scottish Heart Health Study (SHHS), six-year follow-up of men

    • potential confounders: various lifestyle factors (in particular smoking)

    CHD
    Housing tenure Yes No Risk
    Rented 85 1821 0.0446
    Owner-occupied 77 2400 0.0311
    Odds ratio 1.45

    Example: Housing tenure and CHD (cont.)

    • stratified analysis by smoking status

    • k=2 strata: smokers (1) and non-smokers (2)

    • n1=2726 and n2=1657

    • exposure: living rented accommodation versus owner-occupied

    Non-smokers CHD
    Housing tenure Yes No Risk
    Rented 33 923 0.0345
    Owner-occupied 48 1722 0.0271
    Odds ratio 1.28
    Smokers CHD
    Housing tenure Yes No Risk
    Rented 52 898 0.0547
    Owner-occupied 29 678 0.0410
    Odds ratio 1.35

    Example: Housing tenure and CHD (cont.)

    • Mantel-Haenszel estimate of the common odds ratio

      ψ^=33×17222726+52×6781657923×482726+898×291657=1.32

    • standard error: SE(log(ψ^))=0.1649

    • approximate 95% confidence interval:

      [exp(0.278-1.96×0.1649),exp(0.278+1.96×0.1649)]
      =[0.96;1.82]

    • conclusion: by strata the odds ratios are 1.28 (smokers) and 1.35 (non-smokers). The adjusted estimate is 1.32 (a weighted average of the two). The 95% confidence interval for the true odds ratio spans one hence the association between housing tenure and CHD is not significant at the 5% level

    • some confounding is manifest here since a similar reduction is seen in both strata. Since the reductions are small (from 1.45) the degree of confounding is small

    Matching in case control studies

    • case-control studies can be unstratified or stratified

    • unstratified: choose controls randomly

    • stratified: match controls to cases according to confounding variables

      • *

        group matching (constant ratio of cases and controls within broad strata)

      • *

        individual matching: matching controls to each case (e.g. 1:1 matching)

      • *

        the rationale in a matched case-control study is to eliminate confounders by design

      • *

        matching is a design-based approach to controlling confounding

    Case-Control Studies: Pros and Cons of Matching

    • advantages

      • *

        control confounders by elimination

      • *

        gain in efficiency (depending on strength of confounder)

      • *

        avoid/minimize selection bias (e.g. neighbourhood matching)

    • disadvantages

      • *

        more complicated study design

      • *

        it is not possible to study the effect of matching variables on the outcome of interest if you match, for example, on age you cannot study the effects of age on disease outcome!!

      • *

        overmatching (e.g. matching variable strongly related to exposure, but not to disease )

    Odds ratios in 1:1 Matched case-Control Studies

    • we shall focus upon the analysis of matched pairs

    • for each case a control is selected matched on values of confounders

    • in each matched pair we can classify the case and the control as exposed (E) or not exposed (E¯)

    • we can then tabulate the frequencies of case-control pairs
      History History of control of case Exposed Unexposed Exposed a b Unexposed c d

    • the counts in the table represent the number of pairs not individuals

    • this corresponds to (a+b+c+d) 2×2 tables for each pair

    OR in 1:1 Matched case-Control Studies (cont.)

    • the tabulated values arise from the four possible 2×2 tables for the case-control pairs

    (1)
    Case Control
    Exposed 1 1 2
    Unexposed 0 0 0
    1 1 2
    (2)
    Case Control
    Exposed 1 0 1
    Unexposed 0 1 1
    1 1 2

    (3)
    Case Control
    Exposed 0 1 1
    Unexposed 1 0 1
    1 1 2
    (4)
    Case Control
    Exposed 0 0 0
    Unexposed 1 1 2
    1 1 2

    Conditional Likelihood for 1:1 Matched Studies

    • condition each stratum table on both margins

    • conditional on the margins, tables (1) and (4) deterministic

    • only tables (2) and (3) relevant

    • the values b and c in pairs frequency table are the so-called discordant pairs

    • although there are a+b+c+d pairs in total we are only interested in the matched groups with discordant exposures (i.e the b+c discordant pairs)

    Odds ratio and standard error (1:1) matched case-control studies)

    • the maximum likelihood estimate of the exposure odds ratio is: ψ^=b/c

    • with standard error: SE(log(ψ^))=1/b+1/c

    • an approximate 100(1-α)% confidence interval can be computed for the true odds ratio and interpreted in context of the disease odds ratio

    Example: Tonsillectomy and Hodgkin’s Disease

    • objective: relationship between history of tonsillectomy and incidence of Hodgkin’s disease

    • case-control study: 85 pairs of cases and controls

    • odds ratio: ψ^=15/7=2.14 (loge(ψ^)=0.7608)

    • standard error: SE(loge(ψ^))=1/15+1/7=0.4577

    • approx. 95% CI : [0.872; 5.249]

    • More examples in the workshop!!!

    History History of control
    of case Positive Negative
    Positive 26 15
    Negative 7 37

    OR in Matched Case-Control Studies

    • individually matched studies

      • *

        standard analysis ignoring the 1:1 matching is misleading

      • *

        should use conditional analysis as above (discordant pairs)

    • several controls per case: arguments above may be extended

    • Mantel-Haenszel method: note that if you consider each matched pair as a strata same estimator and standard error (exercise)

    • risk, risk difference, relative risk: cannot be estimated in matched case-control studies

    Standardisation

    • a principle role in epidemiology is to compare the incidence of disease or mortality between two or more populations

    • comparing crude mortality rates can be misleading since populations may differ with respect to confounders such as age and gender which in turn will impact upon mortality

    • one approach it to simply produce rates by strata of the confounder such as mortality rates by age group

    • when comparing a large number of populations over various strata the data can be become unmanageable

    • an alternative approach is to combine category specific rates in such a way that has been adjusted for the confounding factor: standardisation

    • standardisation is a process aimed at removing confounding by choosing a ’standard population’ with a known distribution of the confounder (e.g known age structure)

    • most common: age standardisation, age/sex standardisation

    • example: comparison of mortality rates between a seaside resort and an industrialised town

    Types of standardisation

    • there are two methods of standardisation commonly used in epidemiology: direct and indirect

    • both methods require a ‘standard’

    • direct standardisation: the disease rates in the population of interest (study population(s)) are applied to the ’standard’ population

    • indirect standardisation: the disease rates in the standard population are applied to the population of interest (study population(s))

    • both direct and indirect methods involve calculating expected E numbers of events (e.g. deaths) and comparing them to the observed number of events O

    • the most common ‘standard’ is based upon age strata or age/gender strata

    Direct Standardisation of Event Rates

    • direct standardised event rate: the expected event rate in the ’standard’ population if the age-specific event rates in the study population prevailed

    • so: if adjusting for age, the category specific event rates for each population being compared will be applied to a single standard population

    • standardised event ratios can then be calculated and regions compared

    • notation: standardising by age

      • *

        oi observed no. of events in the ith age group of the study population

      • *

        pi size of the ith age group of the study population

      • *

        pi(S) size of the ith age group of the standard population

      • *

        pS=ipi(S) total size of standard population

    Direct Standardisation of Event Rates

    • direct age standardised event rate per 1,000:

      1,000p(S)oipipi(S)

    • hypothetical example: see handout

    • note

      • *

        strictly speaking it is a proportion, not rate

      • *

        however, average population size times study length (usually one year) person-years

    Indirect Standardisation of Event Rates

    • indirect methods are commonly used when age specific rates are not available

    • the strata specific rates of the standard population are applied to the strata of study population and the expected number of events calculated assuming the standard population rate prevails

    • methods give rise to a standardised event ratio (SER) or standardised mortality ratio (SMR)

    • SMR=OE with O=oi and E=oi(S)pi(S)pi

    • the SMR is typically multiplied by 100 and expressed as a percentage

    Discussion: Direct and Indirect Standardisation

    • direct standardisation requires age specific rates for all populations being studied

    • the indirect method requires the total number of cases O

    • the ratio of two indirectly standardised rates is called the standardised incidence ratio or the standardised mortality ratio

    • indirect standardisation more frequently used

    • indirect standardisation more stable in case of small numbers of events

    • indirect standardisation requires age-specific rates for standard population

    • the choice of standard population should be stated clearly. Often the National standard population will be used when comparing sub regions