Home page for accesible maths 2.1 Statistical models and parameter estimation

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

2.1.2 Estimation uncertainty

A key part of statistical estimation is quantification of uncertainty in the parameter estimate θ^. As you saw in Math104, the parameter estimates are highly unlikely to be equal to the true population values, due to sampling uncertainty. To understand why this uncertainty exists, consider the following example.

TheoremExample 2.1.3 Magnets continued

From above, we found the mean and variance of the participants to be μ^=4.38 and σ^2=9.89. Let us consider the first ten active treatment participants only. To obtain the mean and variance of their pain scores in R,

> mean(score_post_active[1:10])
[1] 3.2
> var(score_post_active[1:10])
[1] 5.066667

Now consider the next ten active treatment patients, their sample mean and variance are given by

> mean(score_post_active[11:20])
[1] 2.6
> var(score_post_active[11:20])
[1] 3.377778

In fact, we could take any random sample of 10 patients, and get slightly different estimates for the mean and variance. The same thing would happen if we took multiple samples of any given size.

  • Suppose that we took 200 samples of size 10, what would the histograms of the 200 estimates μ^ and σ^2 look like?

  • In other words, what are the repeated sampling distributions of the estimators μ^=X¯ and σ^2=S2?

  • What would happen to these distributions if the sample size was increased/decreased?

In general:

  • Uncertainty occurs due to repeated sampling.

    • We are using a sample of size n to tell us about a much larger population.

    • Assuming that the sample was taken at random there are many different samples of size n that we could have taken.

    • How would our estimate θ^ change if we had a different sample?

  • As you saw in Math104, uncertainty can be captured using confidence intervals.

  • More information is obtained by considering the full sampling distribution of the estimator.