clinical investigation often requires the classification of individuals; most notably a diagnosis of disease status (e.g. diseased/non-diseased)
classification procedures are typically based upon diagnostic tests
diagnostic tests take various forms such as expert opinion, questionnaire or laboratory testing techniques. For example, mammography for breast cancer screening; serum myoglobin tests for heart disease and psychological evaluation for mental disorders
diagnostic accuracy is of key importance but there is also often a drive to develop testing procedures which are less-invasive/more acceptable to patients than existing techniques, particularly for disease screening
in some situations accuracy of classification may not be in question (such as death), however, most often the state of affairs will be less certain
it is thus of interest to quantify the tests’ reliability in the context of utility
typically assessment of a tests’ utility is achieved by applying the test to a number of individuals whose true disease status is known typically based upon a gold-standard
“the gold-standard is the most accurate existing test for the condition”
‘gold-standards’ include histology in cancer diagnosis and microbiology for infectious diseases
disease classifications may be based on dichotomous test results (positive/negative or presence/absence) or based upon continuous measurements
for continuous measurements a cutoff level has to be determined
examples of continuous measures include:
blood-sugar level for the diagnosis of diabetes
cardiac enzyme levels for myocardial injury following myocardial infarction
blood and brain changes in Alzheimer’s disease and dementia
two types of error can occur when testing: false positives and false negatives
sensitivity: the proportion of truly diseased persons in the tested population who are identified as diseased by the screening test probability of diagnosing a true case as diseased
specificity: the proportion of truly non-diseased persons who are so identified by the screening test (probability of diagnosing a truly non-diseased person as non-diseased):
where and denote diseased and non-diseased individuals, respectively, and and , respectively, denote positive and negative test results
also we may want to consider the probability that someone with a positive test result is actually diseased, and conversely, the probability that someone with a negative test result is actually disease free
positive predictive value (PPV) is the proportion of persons who are in fact diseased among those who test positive:
negative predictive value (NPV) is the proportion of persons who are in fact non-diseased among those who test negative:
True Disease Status | |||
---|---|---|---|
Test Result | Diseased | Non-diseased | Total |
Positive | |||
Negative | |||
Total |
Note PPV and NPV are affected (potentially strongly) by the true prevalence of the disease
the values (as above) are typically multiplied by 100 and expressed as a percentage
when assessing tests based upon continuous measurements, the choice of the ’cut-off’ point is important, as this affects sensitivity and specificity
Example: Enzyme tests and myocardial infarction (MI): use of creatinine phosphokinase (CPK) assay in a coronary care unit. The data obtained were as follows:
CPK activity | MI | non-MI |
---|---|---|
0–49 | 2 | 32 |
50–99 | 4 | 10 |
100–149 | 6 | 5 |
150–399 | 14 | 2 |
400+ | 21 | 0 |
Total no. patients | 47 | 49 |
the four ‘sensible’ choices of cut-off point give sensitivities; specificities and PPV’s and NPV’s:
Cutoff | Sensitivity | Specificity | PPV | NPV |
---|---|---|---|---|
50 | 96 | 65 | 73 | 94 |
100 | 87 | 86 | 85 | 88 |
150 | 74 | 96 | 95 | 80 |
400 | 45 | 100 | 100 | 65 |
it is clear from above that for this particular example:
a low cut-off results in a more sensitive test;
a high cut-off results in a more specific test;
intermediate cut-offs result in smaller overall errors
choice of cut-off strike a balance between high sensitivity and low specificity
each of the values in the previous table is calculated by forming a dichotomy based upon the test result: (+ve; -ve) for each cut-off point and then constructing a table in the form as previously. Below is the table for a cut-off point of CPK of 50 or greater.
Disease Status | |||
---|---|---|---|
CPK | MI | Non-MI | Total |
(+ve) | 45 | 17 | 62 |
(-ve) | 2 | 32 | 34 |
Total | 47 | 49 | 96 |
sensitivity, specificity and their sum can be plotted to evaluate the various cut-points but typically a plot of sensitivity against (1 - specificity) is produced for the various cut-off values: the so-called receiver operating characteristic (ROC) curve
Unnumbered Figure: Link
all things being equal one typically chooses the cut-off corresponding to the top-left-most point on this curve
prevalence, recall, is the proportion of the population of interest who have the disease at a given time point contextually: essentially the prior probability of disease before observing the test result
assuming the data are a simple random sample from the general population of interest then an estimate is given by:
both PPV and NPV depend upon disease prevalence
if the study sample is not random (and this does not reflect the prevalence in the population of interest) then PPV and NPV do not yield a valid measure of the tests predictive value
sensitivity and specificity, however, are independent of the prevalence of disease
the positive predictive value can be expressed using Bayes theorem:
if the sample is non-random the population prevalence can be elicited from the population of interest and the predictive value of the test evaluated in context
the PPV equation above allows the prior probability of disease to be combined with the data regarding the test accuracy (i.e. the sensitivity and specificity) leading to a so-called posterior (post-test result) assessment of disease state given a positive test result
Example: a spinal fluid test is known to be 95% effective for detecting a disease when it is present and 99% effective when it is not. The prevalence of the disease is known to be 0.5% in the population of interest, thus we have
and hence the post-test probability of disease for those testing positive is 33%
Note the NPV is given by:
hence the post-test probability of being non-diseased for those testing negative is 99.8%
What if the disease prevalence was higher in the population?
therapeutic decisions with regard to tests are often considered using the likelihood ratio of a positive test result:
for the test to be informative we want the value to be high
the positive likelihood ratio is the ratio of the posterior disease odds (i.e. after witnessing a positive test result) to the prior disease odds and hence tells us how much the odds of disease increase given a positive test result
Example: A test has sensitivity of 90% and a specificity of 78% then
hence a positive test result implies you are 4 times more likely to have the disease
note that the LR may increase or decrease if the sensitivity and specificity of two competing diagnostic tests move in opposite directions
in general we need to consider the relative importance or weight that we will give to sensitivity
if the disease in question is life threatening and the treatment (subsequent to a positive diagnosis) has few side-effects then we would would want to give high weight to the sensitivity
if the disease in question is not too serious but treatment has many side-effects we would want to give more weight to specificity
thus if we give sensitivity weight and specificity we would want to maximise
all things being equal we aim to maximise the number of correct decisions
to maximise with respect to the study (’sample criteria’) choose
to maximise with respect to the population criterion choose
which is the disease prevalence
based upon Youden’s criteria one chooses
the choice of may be crucial in comparing tests
a screening test is a test for a particular disease given to populations at risk who are asymptomatic
the aim is to detect disease early with a view to better prognosis
screening tests are generally cheap (relatively) and typically followed up with more specific procedures to confirm diagnosis
screening tests are often based upon bio-markers from blood or urine samples
PSA (prostate specific antigen) levels for prostate cancer
carcinoembryonic antigen for colorectal cancer
clearly when screening for disease the sensitivity and specificity of the screening test is paramount
when screening the proportion who are truly diseased is likely to be small many patients who are disease free will test positively and thus the acceptability of the test and follow-up need to be carefully considered
diagnostic testing in standard clinical practice is based upon patients presenting with disease symptoms thus the weightings on test acceptability and test accuracy may differ
in order for a screening programme to be introduced several considerations need to be made (WHO)
the condition should be an important health problem
there should be a treatment for the condition
facilities for diagnosis and treatment should be available post screening
there should be a latent stage in the disease (early stage in the disease within which diagnosis is important)
there should exist a suitable screening test
the test should be acceptable to the population
the natural history of the disease should be well understood
there should be an agreed policy on who to treat
the total cost of finding a case should be economically balanced in relation to medical expenditure as a whole
case finding should be a continuous process not just a ’once and for all’ project
cancer screening:
cervical cancer (pap smear)
breast cancer (mammography)
colorectal cancer (colonoscopy)
melanoma (derma check)
bowel cancer (faecal occult blood test)
foetal abnormalities
alpha-fetoprotein
blood tests
ultrasound
In addition to test accuracy there are other issues to consider
lead-time bias: (survival time since diagnosis is longer with screening) need to compare mortality in screened/non-screened groups
length-time bias: less severe cancers may be screen detected which may not be otherwise implicated prior to death so benefits seem greater
selection bias: sub-groups may be more likely to attend for screening such as those with family history
over-diagnosis abnormalities identified that would never cause a problem in the persons lifetime (e.g. prostate screening)
a randomised controlled trial (RCT) is recommended to demonstrate utility
Notwithstanding the above mentioned issues screening has resulted in better prognosis in many cases of disease. Consider, for example, the pap smear for cervical cancer.