3 Case-Control Methods 3 Case-Control Methods 3.2 Spatial clustering

3.1 Introduction to Case-control studies

One type of study that is able to overcome this issue is the case-control design. In a case-control study, we identify patients (cases) with the disease and ascertain their past exposure to conjectured aetiological factors. We then compare the information from the collection of cases to that obtained from a sample from the population who do not have the disease (controls). We can estimate odds-ratios from a case-control study, but not relative risk since the study population has been artificially constructed (we cannot estimate disease incidence from case-control data).

As in other types of study design, we ‘adjust’ our estimates of covariate effects for potential confounding factors by measuring them and including them in our analyses. This adjustment can be further improved by matching cases and controls either at the individual-level (for example by pairing each case with a control of the same age and sex) or at the group-level (for example, by choosing a control group with an overall age and sex distribution similar to that of the cases).

3.1.1 Selection of cases

This requires a suitably rigorous definition of what it means to be a ‘case’ and a careful plan of how cases will be acquired. In particular, one should seek to avoid bias in the selection procedure.

3.1.2 Selection of controls

The key properties of the control group are (1) that they should be a representative sample, in terms of exposure to risk-factors and confounders, from the population at risk of becoming cases; and (2) that we should be able to measure exposure in the control group with a similar accuracy to those in the cases group. Sometimes there will be more than one control per case, but there is a limit on how much is to be gained by adding more and more controls.

3.1.3 Simple Analyses of Case-Control Studies

A simple statistic used to present the results of a case/control study is the $2\times 2$ table, an example is shown in Figure 3.1. Such tables give a breakdown of the number of cases by presence/absence of a disease (or outcome) and exposure of interest. Since by design the individuals involved in a case-control study are not a random sample from the population, we can neither compute the incidence nor risk of disease in this setting. However, we can legitimately estimate the odds ratio and produce a confidence interval for it.

Definition 3.1.

Let $E$ be an event of interest in a probability space. The odds of the event $E$ is:

\frac{\mathbb{P}(E)}{1-\mathbb{P}(E)}.

		Disease (outcome)
		yes	no	Totals
Risk factor	yes	$a$	$b$	$a+b$
	no	$c$	$d$	$c+d$
	Totals	$a+c$	$b+d$	$a+b+c+d$

Table 3.1:

2\times 2

table typical in case control studies.

Definition 3.2.

The odds ratio comparing exposure rates in the cases group with those in the control group in a case control study is:

	Odds Ratio	$\displaystyle=$	$\displaystyle\frac{\text{Odds of exposure in cases}}{\text{Odds of exposure in% controls}}$
		$\displaystyle=$	$\displaystyle\frac{a/c}{b/d}$

This is mathematically the same quantity as the odds of disease in the exposed group compared with the control group ( $(a/b)/(c/d)$ ).

We can compute a confidence interval for the $\log(\text{odds ratio})$ quite straightforwardly since,

\mathrm{s.e.}[\log(\text{odds ratio})]=\sqrt{\frac{1}{a}+\frac{1}{b}+\frac{1}{% c}+\frac{1}{d}}.

The computation of odds ratios can be useful in the exploratory phase of an analysis.

3.1.4 Spatial Case-Control Studies

The data for a spatial case-control study consist of two point patterns:

•

the locations of all known cases of particular disease in a geographical region $A$ , over a defined period of time
•
the locations of a sample of controls, selected from the population at risk:
- –
  
  completely at random
- –
  
  group-matched (eg to preserve sex-ratio)
- –
  
  individually matched

For each of the three substantive problems identifed in Section 1, we consider the analysis of a completely random case-control study.

Group-matched studies can usually be analysed by pooling results from separate analyses within each group.

When spatial variation is of scientific interest, individual matching should be avoided if possible as it complicates the interpretation of estimated spatial variation.