8 Censoring

8.1 Introduction

Definition 8.1.1.

Censoring is a term used to indicate that a measurement is incomplete or partially observed. Even if an event cannot be observed directly, the fact that it is censored still provides some information that is useful. For example, terminally ill patients in a survival trial receive a treatment slow the progression of their illness. The survival times of patients who are alive at the end of the study must be censored.

8.1.1 Reasons for censoring

Three main reasons for why censoring may occur:

  1. 1.

    Events occur totally independent to the study that means that it is not possible to observe the event of interest. For example, the fire alarm goes off and so the experiment is cut short.

  2. 2.

    Limitations of the experimental design. For example, there is a time restraint on the experiment before it must end.

  3. 3.

    Censoring occurs as a consequence of the experiment. For example, a patient in a clinical trial has an adverse reaction to the treatment and so must be removed from the study for their safety.

The first two examples of censoring are independent of the event of interest. On the contrary, in the third case the study and reason for censoring are dependent; also known as informative censoring. This type of censoring requires further consideration and so is not discussed in this chapter.

Remark Censoring is not the same as truncation. When a measurement is censored then the event of interest could have occurred, but due to some reason it was not possible to observe the event. On the contrary, if there is some constraint that is preventing a particular event happening whether observed or not, then this is truncation. For example, the number of students who can attend a lecture is limited by the number of seats available in room; say 100 seats. If we are then interested in monitoring that the attendance is at least 80%, then the lecturer need only count 80 students. So, the maximum number of students the lecturer could count is 100 (truncated) but she need not count beyond 80 students (censoring point).

 
Exercise 8.60
For the following scenarios, state whether the event has been censored.

  1. 1.

    River depth measurement cannot be recorded as depth is greater than the meter ruler.

  2. 2.

    A participant in a clinical trial cannot make regular meeting with the doctor due to moving overseas.

  3. 3.

    Parts of a multiple choice questionnaire on household income has not been filled in.

  4. 4.

    120 marks are available on an exam paper, but the number of marks awarded is capped at 100.

 

8.1.2 Direction of censoring

When discussing censoring, it is important to clarify the direction of censoring:

  • Right censoring – The value of the event is greater than a specified value but it is unknown by how much. For example, the survival time of patients who are alive at the end of the study are right censored.

  • Left censoring – The value of the event is lower than a specified value but it is unknown by how much. For example, in child development studies, the time for a child to learn a particular task is left censored if they can already perform the task when they enter the study.

  • Interval censoring – The observed events occur within a specified interval, with values smaller than the lower bound are left censored and values greater than the upper bound are right censored. For example, Figure 8.3 (Link) depicts a line-up of three prisoners where the height of the first prisoner is left censored at 4 feet, the height of the second is observed at 5.75 feet and the third is right censored at 7 feet.

Figure 8.1: First Link, Second Link, Caption: Left: Police line-up – first is too short to be measured (left censored), second can be measured (observed at 5.75ft) and the third is too tall to be measured (right censored). Right: Histogram of true heights – values in the left region are left censored at 5ft whilst values in the right region are right censored at 6ft.

8.1.3 Example: AA battery

Suppose you are to undertake a product quality study to investigate the lifetime of AA batteries. The experiment involves purchasing n off-the-shelf batteries made by the same manufacturer from a variety of stores and put them in a device that drains the battery under a constant 50 mA current. The time taken to deplete the battery is recorded. However, the device you are using is in high demand and so you must complete your experiment within 24 hours.

Let Xi, for i=1,,n, denote the lifetime of the ith battery and that they are independent and identically distributed variables with probability density function fX(x|θ). If the lifetime is observed, then the contribution to the likelihood function is Li(θ)=f(xi|θ). On the contrary, if the lifetime is censored at C (here the known fixed time of 24 hours) then we do not know what the true lifetime is other than it is greater than C; i.e. it is right censored at C.

 
Exercise 8.61
Which is the correct contribution to the likelihood from the right-censored event Xi:

(1)ximakes no contribution(2)Li(θ)=fX(Xi=C|θ)(3)Li(θ)=fX(Xi<C|θ)(4)Li(θ)=fX(Xi>C|θ)

 

Let δi be an indicator function of whether the ith battery lifetime was censored or not:

δi={1if lifetime observed0otherwise.

The likelihood for the sample data is given by:

L(θ)=i=1nfX(xi|θ)δifX(Xi>C|θ)1-δi.

Suppose the battery lifetimes X1,,Xn are independent and identically exponentially distributed with pdf:

f(x|θ)=θexp{-θx}forx>0.

Of the n batteries, lifetime measurements for m batteries were observed, x1,,xm, with the remaining n-m lifetimes censored at C, a known constant.

 
Exercise 8.62
Write down the likelihood for θ.

 

 
Exercise 8.63
Calculate the maximum likelihood estimate of θ.

 

24* 16 22 3 24*
24* 17 3 24* 3
23 20 13 23 3
21 13 10 24* 6
6 24* 24* 12 9
1 23 8 6 5
Table 8.1: AA battery lifetimes in hours, * indicate censored measurements.
Figure 8.2: Link, Caption: Duration chart of lifetimes. Circle – observed, Cross – censored at 24 hours

The data from the experiment is presented in Table 8.1 with Figure 8.2 (Link) depicting the lifetimes of n=30 batteries. Of these, m=23 lifetimes were observed with a total time of i=1mxi=266  hours. The lifetimes for the remaining n-m=7 batteries are censored at C=24  hours. The maximum likelihood estimate for the rate parameter θ is:

θ^=23266+7×24=0.053hr-1

 
Exercise 8.64
Calculate an approximate 95% confidence interval for θ.