Home page for accesible maths 13 Information and Sufficiency

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

13.3 Sufficiency

Recall the driving test data from the Example 13.1.

Number of failed attempts 0 1 2 3
Observed frequency 147 47 20 5
Table 13.2: Number of times taken for drivers to pass the driving test.

We chose to model these data as being geometrically distributed. Assuming that the people in the ‘3 or more’ column failed exactly three times, the log-likelihood for general data x1,,xn is

l(θ) =i=1nlog{θ(1-θ)xi}
=i=1n{log(θ)+xilog(1-θ)}
=nlog(θ)+log(1-θ)i=1nxi.

Now, suppose that, rather than being presented with the table of passing attempts, you were simply told that with 219 people filling in the survey, i=1219xi=102.

Would it still be possible to proceed with fitting the model?

The answer is yes; moreover, we can proceed in exactly the same way, and achieve the same results! This is because, if you look at the log-likelihood, the only way in which the data is involved is through i=1nxi, meaning that in some sense, this is all we need to know.

This is clearly a big advantage, we just have to remember one number rather than an entire table.

We call i=1nxi a sufficient statistic for θ.

Definition.

Let 𝐱=x1,,xn be a sample from f(|θ). Then a function of the data T(𝐱) is said to be a sufficient statistic for θ (or sufficient for θ) if 𝐱 is independent of θ given T(𝐱), i.e.

Pr[𝐗=𝐱|T(𝐱),θ]=Pr[𝐗=𝐱|T(𝐱)].

Some consequences of this definition:

  • 1

    For the objective of learning about θ, if I am told T(𝐱), there is no value in being told anything else about 𝐱.

  • 2

    If I have two datasets 𝐱𝟏 and 𝐱𝟐, and T(𝐱𝟏)=T(𝐱𝟐), then I should make the same conclusions about θ from both, even if 𝐱𝟏𝐱𝟐.

  • 3

    Sufficient statistics always exist since trivially T(𝐱)=𝐱 always satisfies the above definition.

Definition.

Let 𝐱=x1,,xn be a sample from f(|θ). Let T(𝐱) be sufficient for θ. Then T(𝐱) is said to be minimally sufficient for θ if there is no sufficient statistic with a lower dimension than T.

Theorem (Neyman factorisation theorem).

Let x=x1,,xn be a sample from f(|θ). Then a function T(x) is sufficient for θ if and only if the likelihood function can be factorised in the form

L(θ)=g(𝐱)×h(T(𝐱),θ),

where g is a function of the data only, and h is a function of the data only through t(x).

For a proof see page 276 of Casella and Berger.

We can also express the factorisation result in terms of the log-likelihood, which is often easier, just by taking logs of the above result:

l(θ) =log{g(𝐱)×h(T(𝐱),θ)}
=log{g(𝐱)}+log{h(T(𝐱),θ)}
=g~(𝐱)+h~(T(𝐱),θ),

where g~=log(g) and h~=log(h).

We can show that i=1nxi is sufficient for θ in the driving test example by inspection of the log-likelihood:

l(θ)=nlog(θ)+log(1-θ)i=1nxi.

Letting T(𝐱)=i=1nxi , then h~(T(𝐱),θ)=nlog(θ)+log(1-θ)T(𝐱) , and g~(𝐱)=0, we have satisfied the factorisation criterion, and hence T(𝐱)=i=1nxi is sufficient for θ.

Suppose that I carry out another survey on attempts to pass a driving test, again with n=219 participants and get data y=y1,,yn, with 𝐱𝐲 but i=1nxi=i=1nyi. Are the following statements true or false?

  1. 1

    θ^(𝐱), the MLE based on data 𝐱, is the same as θ^(𝐲), the MLE based on data 𝐲.

  2. 2

    The confidence intervals based on both datasets will be identical.

  3. 3

    The geometric distribution is appropriate for both datasets.

An important shortcoming in only considering the sufficient statistic is that it does not allow us to check how well the chosen model fits.

TheoremExample 13.3.1 Poisson parameter (cont.)

Recall from the beginning of this section, the London homicides data, which we modelled as a random sample from the Poisson distribution. We found

L(λ|x1,,xn) =i=1nλxiexp(-λ)xi!
=λixiexp(-nλ)i=1n1xi!
λixiexp(-nλ),

and that the log-likelihood function for the Poisson data is consequently

l(λ)=log(λ)i=1nxi-nλ+c,

with the MLE being

λ^=i=1nxin=x¯.

By differentiating again, we can find the information function

l′′(λ|𝐱)=-λ-2i=1nxi,

and so

IO(λ|𝐱)=λ-2i=1nxi.

What is a sufficient statistic for the Poisson parameter?

For this case, we can let T(𝐱)=i=1nxi, and h~(T(𝐱),θ)=log(λ)T(𝐱)-nλ, and g~(𝐱)=c=-i=1nlog(xi!), we have satisfied the factorisation criterion, and hence T(𝐱)=i=1nxi is sufficient for λ.

TheoremExample 13.3.2 Normal variance

Suppose the sample x1,,xn comes from XN(0,θ). Find a sufficient statistic for θ. Is the MLE a function of this statistic or of the sample mean? Give a formula for the 95% confidence interval of θ.

First, the Normal(0,θ) density is given by

f(xi|θ)=12πθexp{-xi22θ},

leading to the likelihood

  1. 1

    L(θ)=i=1n12πθexp{-xi22θ},

  2. 2

    L(θ)=1θn/2exp{-ixi22θ}.

Hence, T(𝐱)=ixi2 is a sufficient statistic for θ. The log-likelihood and score functions are

  1. 1

    l(θ)=-n2logθ-ixi22θ,

  2. 2

    S(θ)=l(θ)=-n2θ+ixi22θ2.

Solving S(θ)=0 gives a candidate MLE

θ^=ixi2n,

which is a function of the sufficient statistic. To check this is an MLE we calculate

l′′(θ)=n2θ2-ixi2θ3.

In this case it isn’t immediately obvious that l′′(θ^)<0, but substituting in

l′′(θ^) =n2(xi2n)2-xi2(xi2n)3
=n32(xi2)2-n3(xi2)2
=-n32(xi2)2<0,

confirming that this is an MLE.

The observed information is IO(θ^)=-l′′(θ^),

IO(θ^)=n32(xi2)2.

Therefore a 95% confidence interval is given by

(l,u) =(θ^-1.96IO(θ^),θ^+1.96IO(θ^))
=(ixi2n-1.96n-3/2xi22,ixi2n+1.96n-3/2xi22).