MATH454/554: Project III
TO BE HANDED IN BY MONDAY 15/01/2018 (WEEK 11), 10:00.
This project will contribute 17% towards the final module mark.
Submission: Upload the pdf of your answer and your R code file to the Moodle site. Your R code should be as .r or .txt file so that it can be copied and pasted to run. Submit also a printed copy of your answers (no need for a printed copy of the R code), together with a plagiarism cover sheet, to the MSc submissions pigeon hole. Please write your student ID on your answers, not your name.
Let with probability mass function
Then is a Geometric random variable. The mean and variance of are and , respectively. Thus the variance is greater than the mean and the geometric distribution can be preferable to the Poisson distribution for data which is over-dispersed (shows more variability) than we would expect from a Poisson distribution.
We take into account explanatory variables (covariates) by using the following model:-
(1) |
Thus for ,
Data
The data is provided in pupil.txt and contains the number of days absent of 314 pupils along with three covariates. The number of absence days are given in column 1 and the three remaining columns are:
-
•
Gender; 0 - female; 1 - male (column 2);
-
•
Maths test score; range 0-100 (column 3);
-
•
Programme pupil is on; 1, 2 or 3 (column 4).
We will analyse the absence data using the geometric regression model above with including an intercept term and , where , and denote the gender, maths score and programme, respectively, of pupil . Remember is an indicator random variable which is 1 if occurs and 0 otherwise.
The unknown parameters are . The objective here is to conduct posterior inference on these parameters.
In order to conduct Bayesian inference, we need to elicit a prior distribution for , the model parameters. Let us consider the following prior distribution
(2) |
In other words, the prior for is -variate Normal with mean vector and diagonal covariance matrix . You may choose the following hyperparameter values (a -dimensional vector of 0 entries) and , which correspond to a fairly uninformative prior distributions.
-
1.
Write down, up to a constant of proportionality, the joint posterior distribution of . [2]
-
2.
Write a function in R to compute the log-likelihood given , and (the absence data). [3]
-
3.
Write a function in R to compute the log of the prior distribution. [1]
-
4.
Write in R a random walk Metropolis algorithm to obtain samples from the joint posterior distribution of . Use a multivariate Gaussian proposal for with variance matrix V.prop. [3]
Hint: Note that you can use the random walk Metropolis for the Poisson regression example in Lab 5 as a template.
Throughout the following apply the random walk Metropolis algorithm to the pupil data set with initial parameter values and 10000 iterations of the algorithm.
-
6.
Perform a run with V.prop = diag(rep(0.01,5)) ( times the identity matrix). Comment upon the performance of the random walk Metropolis algorithm. [2]
-
7.
Use tuning runs to find a good choice of V.prop. State your final choice of V.prop. [2]
-
8.
Run the random walk Metropolis algorithm with your chosen V.prop for 110000 iterations and estimate the joint posterior distribution of the parameters.
Give a short report of the results obtained. [2] -
9.
Using samples from the posterior distribution for , estimate the probability that a male on programme 3 with a maths score of 60 has no absences. [2]