A key part of statistical estimation is quantification of uncertainty in the parameter estimate . As you saw in Math104, the parameter estimates are highly unlikely to be equal to the true population values, due to sampling uncertainty. To understand why this uncertainty exists, consider the following example.
From above, we found the mean and variance of the participants to be and . Let us consider the first ten active treatment participants only. To obtain the mean and variance of their pain scores in R,
Now consider the next ten active treatment patients, their sample mean and variance are given by
In fact, we could take any random sample of 10 patients, and get slightly different estimates for the mean and variance. The same thing would happen if we took multiple samples of any given size.
Suppose that we took 200 samples of size 10, what would the histograms of the 200 estimates and look like?
In other words, what are the repeated sampling distributions of the estimators and ?
What would happen to these distributions if the sample size was increased/decreased?
In general:
Uncertainty occurs due to repeated sampling.
We are using a sample of size to tell us about a much larger population.
Assuming that the sample was taken at random there are many different samples of size that we could have taken.
How would our estimate change if we had a different sample?
As you saw in Math104, uncertainty can be captured using confidence intervals.
More information is obtained by considering the full sampling distribution of the estimator.