MATH235 MATH235 MATH235 Week 1 - Assessed problems (coursework)

MATH235 Week 1 - Workshop problems

If not all of the problems below are discussed in the workshop for lack of time, then please have a go at the problems on your own.

WS1.1

Consider the sequence of IID random variables $X_{1},\ldots,X_{n}$ with Bernoulli $(p)$ distribution, where $0\leq p\leq 1$ .

(a)

What is the interpretation of $p$ ?
(b)

Name two real life scenarios which could be modelled by the Bernoulli distribution.
(c)
For a Bernoulli( $p$ ) random variable $X$ , what is
- (i)
  
  $\mathbb{E}[X]$ ;
- (ii)
  
  $\mbox{Var}(X)$ ?
(d)

Using your answers to part (c) or otherwise, show that the sample proportion $P_{n}=\frac{X_{1}+\ldots+X_{n}}{n}$ is an unbiased estimator of $p$ .
(e)

What is the variance of $P_{n}$ ? Comment on the properties of this variance as the sample size tends to infinity.

Solution.

(a)

For a Bernoulli $(p)$ random variable $X$ , if $X=1$ is interpreted as a success and $X=0$ as a fail, then $p$ is the probability of a success.
(b)

Any experiment which has two possible discrete outcomes, e.g. outcome of a coin toss, gender, whether or not a seed germinates, whether or not a train arrives late, …
(c)
- (i)
  
  $\mathbb{E}[X]=p$ ;
- (ii)
  
  $\mbox{Var}(X)=p(1-p)$
(d)

$P_{n}$ is unbiased if $\mathbb{E}[P_{n}]=p$ . Since

$\displaystyle\mathbb{E}[P_{n}]=\frac{1}{n}\mathbb{E}[X_{1}+\ldots+X_{n}]=\frac% {1}{n}(\mathbb{E}[X_{1}]+\ldots+\mathbb{E}[X_{n}])=\frac{1}{n}(np)=p$

hence $P_{n}$ is an unbiased estimator of $p$ .
(e)

$\displaystyle\mbox{Var}(P_{n})$ $\displaystyle=$ $\displaystyle\frac{1}{n^{2}}\mbox{Var}(X_{1}+\ldots+X_{n})$

$\displaystyle=$ $\displaystyle\frac{1}{n^{2}}(\mbox{Var}(X_{1})+\ldots+\mbox{Var}(X_{n}))~{}~{}% ~{}\mbox{by independence of $X_{1},\ldots,X_{n}$}$

$\displaystyle=$ $\displaystyle\frac{1}{n^{2}}np(1-p)$

$\displaystyle=$ $\displaystyle\frac{p(1-p)}{n}$

Therefore, $\mbox{Var}(P_{n})\rightarrow 0$ as $n\rightarrow\infty$ . In otherwords, the estimator tends to the true parameter as sample size increases.

WS1.2

Let $X_{1},\ldots,X_{n}$ be IID random variables with a negative binomial $(r,p)$ distribution. The negative binomial distribution is often used to model over-dispersed count variables; for example, it might be used to model the number of times a river floods per year. It has two parameters $r>0$ and $p\in[0,1]$ . Suppose that $p$ is known, so that we want to estimate $r$ only.

(a)

State the expectation and variance of the random variable $Y$ where $Y\sim\mathrm{Poisson}(\lambda)$ , and hence calculate the index of dispersion

$D=\frac{\mathrm{Var}(Y)}{\mathbb{E}[Y]}.$
(b)

For a negative binomial random variable $X$ , the expectation and variance are

$\mathbb{E}[X]=\frac{pr}{(1-p)}~{}~{}~{}\mathrm{and}~{}~{}~{}\mathrm{Var}(X)=% \frac{pr}{(1-p)^{2}}.$

What is the index of dispersion for the negative binomial distribution? What range of values can it take?
(c)

An over-dispersed random variable has an index of dispersion greater than 1. What does this imply about the relationship between the expectation and the variance for that distribution? Using your answers to parts (a) and (b), explain why the Poisson distribution would not be a good model for over-dispersed count data. Why is the negative binomial distribution preferable in this case?
(d)

Assuming that $p$ is known, show that the following is an unbiased estimator of $r$ ;

$\hat{r}=\frac{(1-p)\bar{X}}{p}.$
(e)

What is the variance of $\hat{r}$ ? Comment on the properties of this variance as $n\rightarrow\infty$ .

Solution.

(a)

For a Poisson $(\lambda)$ random variable the expectation and variance are $\mathbb{E}[Y]=\lambda$ and $\mathrm{Var}(Y)=\lambda$ . Therefore the index of dispersion is $\lambda/\lambda=1$ .
(b)

For a negative binomial random variable, the index of dispersion is

$D=\frac{\mathrm{Var}(X)}{\mathbb{E}[X]}=\frac{pr/(1-p)^{2}}{pr(1-p)}=(1-p)^{-1}.$

Since $0\leq p\leq 1$ , then also $D\geq 1$ .
(c)

An over-dispersed random variable has a variance greater than its expectation. The Poisson distribution has index of dispersion exactly equal to 1, regardless of the value of $\lambda$ . Therefore it cannot be used as a model for over-dispersed data. Since the negative binomial distribution has index of dispersion greater than 1, it is a preferable model for over-dispersed data.
(d)

To show unbiasedness, we need to show that $\mathbb{E}[\hat{r}]=r$ ;

$\displaystyle\mathbb{E}[\hat{r}]$ $\displaystyle=$ $\displaystyle\mathbb{E}\left[\frac{(1-p)\bar{X}}{p}\right]$

$\displaystyle=$ $\displaystyle\frac{1-p}{p}\mathbb{E}[\bar{X}]$

$\displaystyle=$ $\displaystyle\frac{1-p}{p}\times\frac{1}{n}\sum_{i=1}^{n}\mathbb{E}[X_{i}]$

$\displaystyle=$ $\displaystyle\frac{1-p}{np}\frac{npr}{1-p}$

$\displaystyle=$ $\displaystyle r$

as required.
(e)

The variance of the estimator $\hat{r}$ is

$\displaystyle\mathrm{Var}(\hat{r})$ $\displaystyle=$ $\displaystyle\mathrm{Var}\left(\frac{(1-p)\bar{X}}{p}\right)$

$\displaystyle=$ $\displaystyle\frac{(1-p)^{2}}{p^{2}}\mathrm{Var}(\bar{X})$

$\displaystyle=$ $\displaystyle\frac{(1-p)^{2}}{n^{2}p^{2}}\sum_{i=1}^{n}\mathrm{Var}(X_{i})$

$\displaystyle=$ $\displaystyle\frac{(1-p)^{2}}{n^{2}p^{2}}\frac{npr}{(1-p)^{2}}$

$\displaystyle=$ $\displaystyle\frac{r}{np}.$

So as $n\rightarrow\infty$ the variance of $\hat{r}$ tends to zero.

WS1.3

Table 0.3 gives forecasts of annual global prices for tea, coffee and sugar from 1995 to 2015. The data were provided by the Economist Intelligence Unit and were obtained from http://datamarket.com/.

Year	Coffee	Sugar	Tea
1995	151.2	13.3	1.4
1996	122.1	12.0	1.7
1997	189.1	11.4	2.0
1998	135.2	8.9	2.0
1999	103.9	6.3	1.8
2000	87.1	8.2	1.9
2001	62.3	8.6	1.6
2002	61.5	6.9	1.5
2003	64.2	7.1	1.5
2004	80.5	7.2	1.7
2005	114.9	9.9	1.6
2006	114.4	14.8	1.9
2007	123.5	10.1	2.1
2008	139.8	13.1	2.3
2009	143.9	16.9	2.7
2010	196.0	21.1	2.9
2011	275.3	25.9	2.9
2012	230.0	21.1	2.6
2013	182.3	16.6	2.5
2014	180.0	16.0	2.4
2015	175.0	15.5	2.2

Table 0.1: Forecasts of annual prices for coffee (cents per lb), sugar (cents per lb) and tea (dollars per kg) from 1995 to 2015. Data provided by the Economist Intelligence Unit.

We wish to test the hypothesis that the mean price of coffee is less than 145 cents per lb. Assume that the data are an IID sample from a Normal $(\mu,\sigma^{2})$ distribution.

(a)

Write down appropriate null and alternative hypotheses for this test.
(b)

Given that $\bar{x}=139.6$ and assuming that $\sigma^{2}=3210$ is known, and by finding an appropriate critical value, carry out this test at the 5% level. You should state the value of your test statistic, the critical value used and any conclusions drawn.
(c)

How would you find the $p$ -value for the test statistic calculated in part (b) using R?

Solution.

(a)

$H_{0}:\mu=145$ against $H_{1}:\mu<145$
(b)

As $\sigma^{2}$ known, calculate

$\displaystyle z=\frac{\bar{x}-145}{\sigma/\sqrt{n}}=\frac{139.6-145}{\sqrt{321% 0/21}}=-0.437$

Since we have a one-tailed test at the 5% level, the critical value used is $z(0.95)=1.64$ . Since $|z|=0.437<1.64$ , do not reject $H_{0}$ and conclude that there is no evidence that the mean price of coffee is less than 145 cents per lb.
(c)

$p$ -value is pnorm(-0.437) this is 0.331.

$\displaystyle\mbox{Var}(P_{n})$	$\displaystyle=$	$\displaystyle\frac{1}{n^{2}}\mbox{Var}(X_{1}+\ldots+X_{n})$
	$\displaystyle=$	$\displaystyle\frac{1}{n^{2}}(\mbox{Var}(X_{1})+\ldots+\mbox{Var}(X_{n}))~{}~{}% ~{}\mbox{by independence of $X_{1},\ldots,X_{n}$}$
	$\displaystyle=$	$\displaystyle\frac{1}{n^{2}}np(1-p)$
	$\displaystyle=$	$\displaystyle\frac{p(1-p)}{n}$

$\displaystyle\mathbb{E}[\hat{r}]$	$\displaystyle=$	$\displaystyle\mathbb{E}\left[\frac{(1-p)\bar{X}}{p}\right]$
	$\displaystyle=$	$\displaystyle\frac{1-p}{p}\mathbb{E}[\bar{X}]$
	$\displaystyle=$	$\displaystyle\frac{1-p}{p}\times\frac{1}{n}\sum_{i=1}^{n}\mathbb{E}[X_{i}]$
	$\displaystyle=$	$\displaystyle\frac{1-p}{np}\frac{npr}{1-p}$
	$\displaystyle=$	$\displaystyle r$

$\displaystyle\mathrm{Var}(\hat{r})$	$\displaystyle=$	$\displaystyle\mathrm{Var}\left(\frac{(1-p)\bar{X}}{p}\right)$
	$\displaystyle=$	$\displaystyle\frac{(1-p)^{2}}{p^{2}}\mathrm{Var}(\bar{X})$
	$\displaystyle=$	$\displaystyle\frac{(1-p)^{2}}{n^{2}p^{2}}\sum_{i=1}^{n}\mathrm{Var}(X_{i})$
	$\displaystyle=$	$\displaystyle\frac{(1-p)^{2}}{n^{2}p^{2}}\frac{npr}{(1-p)^{2}}$
	$\displaystyle=$	$\displaystyle\frac{r}{np}.$