Math330 Exercises Week 1

  • WS1.1

    You will need to do this question in order to complete the online quiz on Moodle.

    The number of trains running late from Lancaster per weekday is thought to follow a Poisson(θ) distribution, and the number of late trains each day is thought to be independent of the value on all other days. Over 10 successive weekdays, the number of late trains is recorded as:

               7 4 5 3 4 3 6 4 2 7
    

    Obtain:

    1. (a)

      The maximum likelihood estimate of θ;

    2. (b)

      The expected information for θ;

    3. (c)

      A 95% confidence interval for θ based on the asymptotic normality of θ^;

  • WS1.2

    The following table shows counts from a sequence of 1572 bases from first human preproglucagon gene:

    A C G T
    1 516 263 227 566

    The GC-content of the gene is of interest, i.e. the proportion of bases that are either guanine (G) or cytosine (C). Let θ denote the GC-content. Construct an appropriate model for θ. Find the MLE, and its 99% confidence interval based on asymptotic normality. Discuss the assumptions you have made. Are they reasonable? How could they be tested?


  • WS1.3

    Suppose a coin is tossed n times and r heads are observed. Let θ represent the probability of a head on a single coin toss.

    1. (a)

      Describe an appropriate model for these data, stating your reasoning.

    2. (b)

      Write down the resulting likelihood and log-likelihood functions for the model.

    3. (c)

      Find the maximum likelihood estimate (MLE) of θ.

    4. (d)

      Is this estimator unbiased?

    5. (e)

      Is your MLE in part (c) a minimum variance unbiased estimator for the proportion, θ?

  • WS1.4

    Suppose you have a sequence of independent real-valued random variables X1,X2, and let Sn=i=1nXi.

    1. (a)

      State conditions under which (1/n)Sn converges to a constant c.

    2. (b)

      Discuss what it means for (1/n)Sn to converge to c and which different forms of convergence exist.

    3. (c)

      State the distribution to which Sn converges and provide conditions under which this convergence occurs.

    4. (d)

      What does it mean for Sn to converge in distribution to a random variable Y? How does this form of convergence relate to the forms of convergence you discussed in (b)?

    These are the assessed questions, which will be marked for credit. Total marks per week are 10.

  • CW1.4

    (Maximum likelihood revision)

    1. (a)

      The wheat yield, Xi, from field i is believed to be normally distributed with mean μzi, where zi is the known quantity of fertiliser spread on the field. Assuming that the yields in the different fields are independent, and that the variance is known to be 1, so that

      XiN(μzi,1)

      for i=1,,n

      1. (i)

        Show that the MLE μ^=i=1nzixii=1nzi2. (1 mark)

      2. (ii)

        Show that μ^ is an unbiased estimator. (Remember that the zi values are constants). (1 mark)

      3. (iii)

        Obtain an approximate 95% confidence interval for μ based on the asymptotic distribution of μ^. (1 mark)

    2. (b)

      Now consider the situation

      XiN(μzi,zi2)

      for i=1,,n, so that the variance of the yield also increases with the known value of zi.

      1. (i)

        Show that the MLE μ^=1ni=1nxizi. (2 marks)

      2. (ii)

        Show that μ^ is an unbiased estimator. (1 mark)

      3. (iii)

        Obtain an approximate 95% confidence interval for μ based on the asymptotic distribution of μ^. (1 mark)

    3. (c)

      Make plots (sketches are sufficient) of how the data (zi,xi), i=1,,n are likely to appear in each of (a) and (b). HINT: you can make up some values for the zis and for μ^. (2 marks)

    4. (d)

      Show that the information provided by each data point is zi2 in (a) and 1 in (b). (1 mark).