If not all of the problems below are discussed in the workshop for lack of time, then please have a go at the problems on your own.
Consider the sequence of IID random variables with Bernoulli distribution, where .
What is the interpretation of ?
Name two real life scenarios which could be modelled by the Bernoulli distribution.
For a Bernoulli() random variable , what is
;
?
Using your answers to part (c) or otherwise, show that the sample proportion is an unbiased estimator of .
What is the variance of ? Comment on the properties of this variance as the sample size tends to infinity.
Solution.
For a Bernoulli random variable , if is interpreted as a success and as a fail, then is the probability of a success.
Any experiment which has two possible discrete outcomes, e.g. outcome of a coin toss, gender, whether or not a seed germinates, whether or not a train arrives late, …
;
is unbiased if . Since
hence is an unbiased estimator of .
Therefore, as . In otherwords, the estimator tends to the true parameter as sample size increases.
Let be IID random variables with a negative binomial distribution. The negative binomial distribution is often used to model over-dispersed count variables; for example, it might be used to model the number of times a river floods per year. It has two parameters and . Suppose that is known, so that we want to estimate only.
State the expectation and variance of the random variable where , and hence calculate the index of dispersion
For a negative binomial random variable , the expectation and variance are
What is the index of dispersion for the negative binomial distribution? What range of values can it take?
An over-dispersed random variable has an index of dispersion greater than 1. What does this imply about the relationship between the expectation and the variance for that distribution? Using your answers to parts (a) and (b), explain why the Poisson distribution would not be a good model for over-dispersed count data. Why is the negative binomial distribution preferable in this case?
Assuming that is known, show that the following is an unbiased estimator of ;
What is the variance of ? Comment on the properties of this variance as .
Solution.
For a Poisson random variable the expectation and variance are and . Therefore the index of dispersion is .
For a negative binomial random variable, the index of dispersion is
Since , then also .
An over-dispersed random variable has a variance greater than its expectation. The Poisson distribution has index of dispersion exactly equal to 1, regardless of the value of . Therefore it cannot be used as a model for over-dispersed data. Since the negative binomial distribution has index of dispersion greater than 1, it is a preferable model for over-dispersed data.
To show unbiasedness, we need to show that ;
as required.
The variance of the estimator is
So as the variance of tends to zero.
Table 0.3 gives forecasts of annual global prices for tea, coffee and sugar from 1995 to 2015. The data were provided by the Economist Intelligence Unit and were obtained from http://datamarket.com/.
Year | Coffee | Sugar | Tea |
---|---|---|---|
1995 | 151.2 | 13.3 | 1.4 |
1996 | 122.1 | 12.0 | 1.7 |
1997 | 189.1 | 11.4 | 2.0 |
1998 | 135.2 | 8.9 | 2.0 |
1999 | 103.9 | 6.3 | 1.8 |
2000 | 87.1 | 8.2 | 1.9 |
2001 | 62.3 | 8.6 | 1.6 |
2002 | 61.5 | 6.9 | 1.5 |
2003 | 64.2 | 7.1 | 1.5 |
2004 | 80.5 | 7.2 | 1.7 |
2005 | 114.9 | 9.9 | 1.6 |
2006 | 114.4 | 14.8 | 1.9 |
2007 | 123.5 | 10.1 | 2.1 |
2008 | 139.8 | 13.1 | 2.3 |
2009 | 143.9 | 16.9 | 2.7 |
2010 | 196.0 | 21.1 | 2.9 |
2011 | 275.3 | 25.9 | 2.9 |
2012 | 230.0 | 21.1 | 2.6 |
2013 | 182.3 | 16.6 | 2.5 |
2014 | 180.0 | 16.0 | 2.4 |
2015 | 175.0 | 15.5 | 2.2 |
We wish to test the hypothesis that the mean price of coffee is less than 145 cents per lb. Assume that the data are an IID sample from a Normal distribution.
Write down appropriate null and alternative hypotheses for this test.
Given that and assuming that is known, and by finding an appropriate critical value, carry out this test at the 5% level. You should state the value of your test statistic, the critical value used and any conclusions drawn.
How would you find the -value for the test statistic calculated in part (b) using R?
Solution.
against
As known, calculate
Since we have a one-tailed test at the 5% level, the critical value used is . Since , do not reject and conclude that there is no evidence that the mean price of coffee is less than 145 cents per lb.
-value is pnorm(-0.437) this is 0.331.