MATH235

MATH235 Week 1 - Workshop problems

If not all of the problems below are discussed in the workshop for lack of time, then please have a go at the problems on your own.

WS1.1 

Consider the sequence of IID random variables X1,,Xn with Bernoulli(p) distribution, where 0p1.

  1. (a)

    What is the interpretation of p?

  2. (b)

    Name two real life scenarios which could be modelled by the Bernoulli distribution.

  3. (c)

    For a Bernoulli(p) random variable X, what is

    • (i)

      𝔼[X];

    • (ii)

      Var(X)?

  4. (d)

    Using your answers to part (c) or otherwise, show that the sample proportion Pn=X1++Xnn is an unbiased estimator of p.

  5. (e)

    What is the variance of Pn? Comment on the properties of this variance as the sample size tends to infinity.

Solution. 

  1. (a)

    For a Bernoulli(p) random variable X, if X=1 is interpreted as a success and X=0 as a fail, then p is the probability of a success.

  2. (b)

    Any experiment which has two possible discrete outcomes, e.g. outcome of a coin toss, gender, whether or not a seed germinates, whether or not a train arrives late, …

  3. (c)
    • (i)

      𝔼[X]=p;

    • (ii)

      Var(X)=p(1-p)

  4. (d)

    Pn is unbiased if 𝔼[Pn]=p. Since

    𝔼[Pn]=1n𝔼[X1++Xn]=1n(𝔼[X1]++𝔼[Xn])=1n(np)=p

    hence Pn is an unbiased estimator of p.

  5. (e)
    Var(Pn) = 1n2Var(X1++Xn)
    = 1n2(Var(X1)++Var(Xn))by independence of X1,,Xn
    = 1n2np(1-p)
    = p(1-p)n

    Therefore, Var(Pn)0 as n. In otherwords, the estimator tends to the true parameter as sample size increases.

WS1.2 

Let X1,,Xn be IID random variables with a negative binomial(r,p) distribution. The negative binomial distribution is often used to model over-dispersed count variables; for example, it might be used to model the number of times a river floods per year. It has two parameters r>0 and p[0,1]. Suppose that p is known, so that we want to estimate r only.

  1. (a)

    State the expectation and variance of the random variable Y where YPoisson(λ), and hence calculate the index of dispersion

    D=Var(Y)𝔼[Y].
  2. (b)

    For a negative binomial random variable X, the expectation and variance are

    𝔼[X]=pr(1-p)andVar(X)=pr(1-p)2.

    What is the index of dispersion for the negative binomial distribution? What range of values can it take?

  3. (c)

    An over-dispersed random variable has an index of dispersion greater than 1. What does this imply about the relationship between the expectation and the variance for that distribution? Using your answers to parts (a) and (b), explain why the Poisson distribution would not be a good model for over-dispersed count data. Why is the negative binomial distribution preferable in this case?

  4. (d)

    Assuming that p is known, show that the following is an unbiased estimator of r;

    r^=(1-p)X¯p.
  5. (e)

    What is the variance of r^? Comment on the properties of this variance as n.

Solution. 

  1. (a)

    For a Poisson(λ) random variable the expectation and variance are 𝔼[Y]=λ and Var(Y)=λ. Therefore the index of dispersion is λ/λ=1.

  2. (b)

    For a negative binomial random variable, the index of dispersion is

    D=Var(X)𝔼[X]=pr/(1-p)2pr(1-p)=(1-p)-1.

    Since 0p1, then also D1.

  3. (c)

    An over-dispersed random variable has a variance greater than its expectation. The Poisson distribution has index of dispersion exactly equal to 1, regardless of the value of λ. Therefore it cannot be used as a model for over-dispersed data. Since the negative binomial distribution has index of dispersion greater than 1, it is a preferable model for over-dispersed data.

  4. (d)

    To show unbiasedness, we need to show that 𝔼[r^]=r;

    𝔼[r^] = 𝔼[(1-p)X¯p]
    = 1-pp𝔼[X¯]
    = 1-pp×1ni=1n𝔼[Xi]
    = 1-pnpnpr1-p
    = r

    as required.

  5. (e)

    The variance of the estimator r^ is

    Var(r^) = Var((1-p)X¯p)
    = (1-p)2p2Var(X¯)
    = (1-p)2n2p2i=1nVar(Xi)
    = (1-p)2n2p2npr(1-p)2
    = rnp.

    So as n the variance of r^ tends to zero.

WS1.3 

Table 0.3 gives forecasts of annual global prices for tea, coffee and sugar from 1995 to 2015. The data were provided by the Economist Intelligence Unit and were obtained from http://datamarket.com/.

Year Coffee Sugar Tea
1995 151.2 13.3 1.4
1996 122.1 12.0 1.7
1997 189.1 11.4 2.0
1998 135.2 8.9 2.0
1999 103.9 6.3 1.8
2000 87.1 8.2 1.9
2001 62.3 8.6 1.6
2002 61.5 6.9 1.5
2003 64.2 7.1 1.5
2004 80.5 7.2 1.7
2005 114.9 9.9 1.6
2006 114.4 14.8 1.9
2007 123.5 10.1 2.1
2008 139.8 13.1 2.3
2009 143.9 16.9 2.7
2010 196.0 21.1 2.9
2011 275.3 25.9 2.9
2012 230.0 21.1 2.6
2013 182.3 16.6 2.5
2014 180.0 16.0 2.4
2015 175.0 15.5 2.2
Table 0.1: Forecasts of annual prices for coffee (cents per lb), sugar (cents per lb) and tea (dollars per kg) from 1995 to 2015. Data provided by the Economist Intelligence Unit.

We wish to test the hypothesis that the mean price of coffee is less than 145 cents per lb. Assume that the data are an IID sample from a Normal(μ,σ2) distribution.

  1. (a)

    Write down appropriate null and alternative hypotheses for this test.

  2. (b)

    Given that x¯=139.6 and assuming that σ2=3210 is known, and by finding an appropriate critical value, carry out this test at the 5% level. You should state the value of your test statistic, the critical value used and any conclusions drawn.

  3. (c)

    How would you find the p-value for the test statistic calculated in part (b) using R?

Solution. 

  1. (a)

    H0:μ=145 against H1:μ<145

  2. (b)

    As σ2 known, calculate

    z=x¯-145σ/n=139.6-1453210/21=-0.437

    Since we have a one-tailed test at the 5% level, the critical value used is z(0.95)=1.64. Since |z|=0.437<1.64, do not reject H0 and conclude that there is no evidence that the mean price of coffee is less than 145 cents per lb.

  3. (c)

    p-value is pnorm(-0.437) this is 0.331.