One type of study that is able to overcome this issue is the case-control design. In a case-control study, we identify patients (cases) with the disease and ascertain their past exposure to conjectured aetiological factors. We then compare the information from the collection of cases to that obtained from a sample from the population who do not have the disease (controls). We can estimate odds-ratios from a case-control study, but not relative risk since the study population has been artificially constructed (we cannot estimate disease incidence from case-control data).
As in other types of study design, we ‘adjust’ our estimates of covariate effects for potential confounding factors by measuring them and including them in our analyses. This adjustment can be further improved by matching cases and controls either at the individual-level (for example by pairing each case with a control of the same age and sex) or at the group-level (for example, by choosing a control group with an overall age and sex distribution similar to that of the cases).
This requires a suitably rigorous definition of what it means to be a ‘case’ and a careful plan of how cases will be acquired. In particular, one should seek to avoid bias in the selection procedure.
The key properties of the control group are (1) that they should be a representative sample, in terms of exposure to risk-factors and confounders, from the population at risk of becoming cases; and (2) that we should be able to measure exposure in the control group with a similar accuracy to those in the cases group. Sometimes there will be more than one control per case, but there is a limit on how much is to be gained by adding more and more controls.
A simple statistic used to present the results of a case/control study is the table, an example is shown in Figure 3.1. Such tables give a breakdown of the number of cases by presence/absence of a disease (or outcome) and exposure of interest. Since by design the individuals involved in a case-control study are not a random sample from the population, we can neither compute the incidence nor risk of disease in this setting. However, we can legitimately estimate the odds ratio and produce a confidence interval for it.
Let be an event of interest in a probability space. The odds of the event is:
Disease (outcome) | ||||
---|---|---|---|---|
yes | no | Totals | ||
Risk factor | yes | |||
no | ||||
Totals |
The odds ratio comparing exposure rates in the cases group with those in the control group in a case control study is:
Odds Ratio | ||||
This is mathematically the same quantity as the odds of disease in the exposed group compared with the control group ().
We can compute a confidence interval for the quite straightforwardly since,
The computation of odds ratios can be useful in the exploratory phase of an analysis.
The data for a spatial case-control study consist of two point patterns:
the locations of all known cases of particular disease in a geographical region , over a defined period of time
the locations of a sample of controls, selected from the population at risk:
completely at random
group-matched (eg to preserve sex-ratio)
individually matched
For each of the three substantive problems identifed in Section 1, we consider the analysis of a completely random case-control study.
Group-matched studies can usually be analysed by pooling results from separate analyses within each group.
When spatial variation is of scientific interest, individual matching should be avoided if possible as it complicates the interpretation of estimated spatial variation.