2 Second Chapter

2.2 Diagnostic testing and screening

  • clinical investigation often requires the classification of individuals; most notably a diagnosis of disease status (e.g. diseased/non-diseased)

  • classification procedures are typically based upon diagnostic tests

  • diagnostic tests take various forms such as expert opinion, questionnaire or laboratory testing techniques. For example, mammography for breast cancer screening; serum myoglobin tests for heart disease and psychological evaluation for mental disorders

  • diagnostic accuracy is of key importance but there is also often a drive to develop testing procedures which are less-invasive/more acceptable to patients than existing techniques, particularly for disease screening

Diagnostic accuracy

  • in some situations accuracy of classification may not be in question (such as death), however, most often the state of affairs will be less certain

  • it is thus of interest to quantify the tests’ reliability in the context of utility

  • typically assessment of a tests’ utility is achieved by applying the test to a number of individuals whose true disease status is known typically based upon a gold-standard

  • “the gold-standard is the most accurate existing test for the condition”

  • ‘gold-standards’ include histology in cancer diagnosis and microbiology for infectious diseases

Binary and continuous testing procedures

  • disease classifications may be based on dichotomous test results (positive/negative or presence/absence) or based upon continuous measurements

  • for continuous measurements a cutoff level has to be determined

  • examples of continuous measures include:

    • blood-sugar level for the diagnosis of diabetes

    • cardiac enzyme levels for myocardial injury following myocardial infarction

    • blood and brain changes in Alzheimer’s disease and dementia

Errors in testing: sensitivity and specificity

  • two types of error can occur when testing: false positives and false negatives

  • sensitivity: the proportion of truly diseased persons in the tested population who are identified as diseased by the screening test probability of diagnosing a true case as diseased

    sensitivity=P(TD)
  • specificity: the proportion of truly non-diseased persons who are so identified by the screening test (probability of diagnosing a truly non-diseased person as non-diseased):

    specificity=P(T¯D¯)
  • where D and D¯ denote diseased and non-diseased individuals, respectively, and T and T¯, respectively, denote positive and negative test results

Positive predictive value and negative predictive value

  • also we may want to consider the probability that someone with a positive test result is actually diseased, and conversely, the probability that someone with a negative test result is actually disease free

  • positive predictive value (PPV) is the proportion of persons who are in fact diseased among those who test positive:

    PPV=P(DT)
  • negative predictive value (NPV) is the proportion of persons who are in fact non-diseased among those who test negative:

    NPV=P(D¯T¯)

Tabular representation of diagnostic test results

True Disease Status
Test Result Diseased Non-diseased Total
Positive a b a+b
Negative c d c+d
Total a+c b+d N
  • sensitivity=aa+c×100

  • specificity=db+d×100

  • positive predictive value=aa+b×100

  • negative predictive value=dc+d×100

  • Note PPV and NPV are affected (potentially strongly) by the true prevalence of the disease

  • the values (as above) are typically multiplied by 100 and expressed as a percentage

Continuous measurements: sensitivity and specificity

  • when assessing tests based upon continuous measurements, the choice of the ’cut-off’ point is important, as this affects sensitivity and specificity

  • Example: Enzyme tests and myocardial infarction (MI): use of creatinine phosphokinase (CPK) assay in a coronary care unit. The data obtained were as follows:

    CPK activity MI non-MI
    0–49 2 32
    50–99 4 10
    100–149 6 5
    150–399 14 2
    400+ 21 0
    Total no. patients 47 49

Continuous measurements: sensitivity and specificity II

  • the four ‘sensible’ choices of cut-off point give sensitivities; specificities and PPV’s and NPV’s:

    Cutoff Sensitivity Specificity PPV NPV
    50 96 65 73 94
    100 87 86 85 88
    150 74 96 95 80
    400 45 100 100 65
  • it is clear from above that for this particular example:

    • a low cut-off results in a more sensitive test;

    • a high cut-off results in a more specific test;

    • intermediate cut-offs result in smaller overall errors

  • choice of cut-off strike a balance between high sensitivity and low specificity

Continuous Measurements: dichotomising test results to evaluate sensitivity and specificity

  • each of the values in the previous table is calculated by forming a dichotomy based upon the test result: (+ve; -ve) for each cut-off point and then constructing a 2×2 table in the form as previously. Below is the table for a cut-off point of CPK of 50 or greater.

Disease Status
CPK MI Non-MI Total
50 (+ve) 45 17 62
<50 (-ve) 2 32 34
Total 47 49 96

ROC curve

  • sensitivity, specificity and their sum can be plotted to evaluate the various cut-points but typically a plot of sensitivity against (1 - specificity) is produced for the various cut-off values: the so-called receiver operating characteristic (ROC) curve

    Unnumbered Figure: Link

  • all things being equal one typically chooses the cut-off corresponding to the top-left-most point on this curve

PPV and NPV and Prevalence

  • prevalence, recall, is the proportion of the population of interest who have the disease at a given time point contextually: essentially the prior probability of disease before observing the test result

  • assuming the data are a simple random sample from the general population of interest then an estimate is given by:

    prevalence=a+cN
  • both PPV and NPV depend upon disease prevalence

  • if the study sample is not random (and this does not reflect the prevalence in the population of interest) then PPV and NPV do not yield a valid measure of the tests predictive value

  • sensitivity and specificity, however, are independent of the prevalence of disease

PPV: posterior probability of disease given a positive test result

  • the positive predictive value P(DT) can be expressed using Bayes theorem:

P(DT)=P(TD)P(D)P(T)=P(TD)P(D)P(TD)P(D)+P(TD¯)P(D¯)
P(DT)=sensitivity×prevalence[sensitivity×prevalence]+[(1-specificity)×(1-prevalence)]
  • if the sample is non-random the population prevalence P(D) can be elicited from the population of interest and the predictive value of the test evaluated in context

  • the PPV equation above allows the prior probability of disease to be combined with the data regarding the test accuracy (i.e. the sensitivity and specificity) leading to a so-called posterior (post-test result) assessment of disease state given a positive test result

Eliciting the posterior probability of disease given a positive test result

  • Example: a spinal fluid test is known to be 95% effective for detecting a disease when it is present and 99% effective when it is not. The prevalence of the disease is known to be 0.5% in the population of interest, thus we have

    PPV=(0.95)(0.005)(0.95)(0.005)+(1-0.99)(1-0.005)
    PPV=0.33
  • and hence the post-test probability of disease for those testing positive is 33%

  • Note the NPV is given by:

    NPV=(0.99)(1-0.005)(0.99)(1-0.005)+(1-0.95)(0.005)
    NPV=0.998
  • hence the post-test probability of being non-diseased for those testing negative is 99.8%

  • What if the disease prevalence was higher in the population?

Positive test likelihood ratios

  • therapeutic decisions with regard to tests are often considered using the likelihood ratio of a positive test result:

    LR=P(TD)P(TD¯)=sensitivity1-specificity
  • for the test to be informative we want the value to be high

  • the positive likelihood ratio is the ratio of the posterior disease odds (i.e. after witnessing a positive test result) to the prior disease odds and hence tells us how much the odds of disease increase given a positive test result

  • Example: A test has sensitivity of 90% and a specificity of 78% then

    LR=0.90.22=4.09
  • hence a positive test result implies you are 4 times more likely to have the disease

  • note that the LR may increase or decrease if the sensitivity and specificity of two competing diagnostic tests move in opposite directions

Weighted decision making

  • in general we need to consider the relative importance or weight that we will give to sensitivity

  • if the disease in question is life threatening and the treatment (subsequent to a positive diagnosis) has few side-effects then we would would want to give high weight to the sensitivity

  • if the disease in question is not too serious but treatment has many side-effects we would want to give more weight to specificity

  • thus if we give sensitivity weight w and specificity 1-w we would want to maximise

    M=w×sensitivity+(1-w)×specificity
  • all things being equal we aim to maximise the number of correct decisions

  • to maximise with respect to the study (’sample criteria’) choose

    w=number of positive samplestotal number of samples
  • to maximise with respect to the population criterion choose

    w=number in population with the diseasetotal population size

    which is the disease prevalence

  • based upon Youden’s criteria one chooses w=0.5

  • the choice of w may be crucial in comparing tests

Population screening for disease

  • a screening test is a test for a particular disease given to populations at risk who are asymptomatic

  • the aim is to detect disease early with a view to better prognosis

  • screening tests are generally cheap (relatively) and typically followed up with more specific procedures to confirm diagnosis

  • screening tests are often based upon bio-markers from blood or urine samples

    • PSA (prostate specific antigen) levels for prostate cancer

    • carcinoembryonic antigen for colorectal cancer

Screening for disease

  • clearly when screening for disease the sensitivity and specificity of the screening test is paramount

  • when screening the proportion who are truly diseased is likely to be small many patients who are disease free will test positively and thus the acceptability of the test and follow-up need to be carefully considered

  • diagnostic testing in standard clinical practice is based upon patients presenting with disease symptoms thus the weightings on test acceptability and test accuracy may differ

  • in order for a screening programme to be introduced several considerations need to be made (WHO)

WHO criteria for disease screening

  • the condition should be an important health problem

  • there should be a treatment for the condition

  • facilities for diagnosis and treatment should be available post screening

  • there should be a latent stage in the disease (early stage in the disease within which diagnosis is important)

  • there should exist a suitable screening test

  • the test should be acceptable to the population

  • the natural history of the disease should be well understood

  • there should be an agreed policy on who to treat

  • the total cost of finding a case should be economically balanced in relation to medical expenditure as a whole

  • case finding should be a continuous process not just a ’once and for all’ project

Common screening programmes

  • cancer screening:

    • cervical cancer (pap smear)

    • breast cancer (mammography)

    • colorectal cancer (colonoscopy)

    • melanoma (derma check)

    • bowel cancer (faecal occult blood test)

  • foetal abnormalities

    • alpha-fetoprotein

    • blood tests

    • ultrasound

Further considerations in screening

In addition to test accuracy there are other issues to consider

  • lead-time bias: (survival time since diagnosis is longer with screening) need to compare mortality in screened/non-screened groups

  • length-time bias: less severe cancers may be screen detected which may not be otherwise implicated prior to death so benefits seem greater

  • selection bias: sub-groups may be more likely to attend for screening such as those with family history

  • over-diagnosis abnormalities identified that would never cause a problem in the persons lifetime (e.g. prostate screening)

  • a randomised controlled trial (RCT) is recommended to demonstrate utility

  • Notwithstanding the above mentioned issues screening has resulted in better prognosis in many cases of disease. Consider, for example, the pap smear for cervical cancer.