2.5 Analysis of case-control studies: binary exposure

•

in a case-control study, sampling of cases and controls is based upon their disease status: diseased, $D$ , or non-diseased, $\bar{D}$
•

information regarding exposure is then obtained retrospectively: $E$ (exposed) and $\bar{E}$ (not exposed)
•

Let $p_{1}=P(D\mid E)$ denote the risk of disease risk amongst those exposed and $p_{0}=P(D\mid\bar{E})$ denote the risk of disease amongst the non-exposed
•

we are interested in estimating $p_{1}$ and $p_{0}$ and, in particular, comparing the disease risk for the exposed and non-exposed groups
•

in a case control study it is not possible to estimate disease risk, differences in disease risk or relative risks since sampling is undertaken based upon disease status
•

in case control studies we can estimate the disease odds ratio:

$\frac{p_{1}/(1-p_{1})}{p_{0}/(1-p_{0})}=\frac{P(D\mid E)P(\bar{D}\mid\bar{E})}% {P(\bar{D}\mid E)P(D\mid\bar{E})}$
•

note that if the disease is rare then

$\frac{p_{1}/(1-p_{1})}{p_{0}/(1-p_{0})}\approx\frac{p_{1}}{p_{0}}$

and the odds ratio approximates the relative risk

Exposure odds ratio and disease odds ratio

•

the outcome measure in a case-control study is the conditional exposure status of the cases and the controls: $P(E\mid D)$ and $P(E\mid\bar{D})$
•

using Bayes theorem we have:

$P(E\mid D)=\frac{P(D\mid E)P(E)}{P(D\mid E)P(E)+P(D\mid\bar{E})P(\bar{E})}$

and

$P(E\mid\bar{D})=\frac{P(\bar{D}\mid E)P(E)}{P(\bar{D}\mid E)P(E)+P(\bar{D}\mid% \bar{E})P(\bar{E})}$
•

the exposure odds for cases (diseased) takes the form:

$\frac{P(E\mid D)}{P(\bar{E}\mid D)}=\frac{P(D\mid E)P(E)}{P(D\mid\bar{E})P(% \bar{E})}$

and

$\frac{P(E\mid\bar{D})}{P(\bar{E}\mid\bar{D})}=\frac{P(\bar{D}\mid E)P(E)}{P(% \bar{D}\mid\bar{E})P(\bar{E})}$

for a control (not diseased)
•

the exposure odds ratio is the ratio of the two exposure odds:

$\frac{P(E\mid D)P(\bar{E}\mid\bar{D})}{P(\bar{E}\mid D)P(E\mid\bar{D})}=\frac{% P(D\mid E)P(\bar{D}\mid\bar{E})}{P(\bar{D}\mid E)P(D\mid\bar{E})}$
•

fundamental relation: the exposure odds ratio is equal to the disease odds ratio

•

consider the exposure status of the cases (diseased) and controls (non-diseased) have been ascertained: exposed $E$ , and non-exposed, $\bar{E}$
•

The results of the case-control study can then be illustrated using the following general tabular representation Cases Controls Total Exposed a b a+b Not exposed c d c+d Total a+c b+d n
•

rationale: measure frequency of exposure in the case and controls group and calculate a measure of association
•

exposure odds ratio:

$\frac{ad}{bc}$
•

disease odds ratio:

$\frac{ad}{bc}$
•

we can thus calculate the exposure odds ratio and a corresponding 95% confidence interval for the true exposure odds ratio and interpret in terms of the disease odds ratio, comparing exposed to non-exposed groups

•

study into alcohol consumption and laryngeal cancer (a relatively rare condition)

•

estimate the exposure odds ratio = disease odds ratio:

$OR=\frac{160\times 110}{90\times 40}=4.889$
•

compute the standard error of the natural logarithm of the odds ratio:

$SE(\log_{e}(OR))=\sqrt{\frac{1}{160}+\frac{1}{90}+\frac{1}{40}+\frac{1}{110}}=% 0.227$
•

compute a $95\%$ confidence interval for the logarithm of the odds ratio and then back-transform (exponentiate): 95% CI: $\log_{e}(4.889)\pm 1.96\cdot 0.227=[1.142;2.032]$ $\rightarrow$ exponentiate to give: $[3.133;7.629]$
•

conclusion: on average the odds of laryngeal cancer was found to be 4.9 times higher amongst those exposed (i.e. consuming alcohol) when compared to those who abstain (no alcohol). We are 95% confident that the true disease odds ratio falls between 3.13 and 7.63 so the true increase in odds could be as high as 7.63 and as low as 3.13. The interval does not contain the value 1 and thus the increase is significant (at the 5% level)

•

the number of available cases is often limited and so the cases are often a complete enumeration of the diseased (i.e all known disease cases in a region in a specified time period)
•

controls (disease free) may be more readily available and thus we can usually choose the number of controls to include
•

we can increase the precision of our estimate for the disease odds ratio by increasing the number of controls (i.e. $b$ and $d$ in the standard error formula for the log odds ratio)
•

beyond about 5 controls per case yields little benefit
•

controls may need to be screened to avoid inadvertent inclusion of cases
•

controls may be a simple random sample from the disease free population at risk or may be matched to the cases
•

matching is used to handle known confounders
•

we shall consider 1:1 matching later this week