Untitled Document

MATH454/554: Project III

TO BE HANDED IN BY MONDAY 15/01/2018 (WEEK 11), 10:00.

This project will contribute 17% towards the final module mark.

Submission: Upload the pdf of your answer and your R code file to the Moodle site. Your R code should be as .r or .txt file so that it can be copied and pasted to run. Submit also a printed copy of your answers (no need for a printed copy of the R code), together with a plagiarism cover sheet, to the MSc submissions pigeon hole. Please write your student ID on your answers, not your name.

Let $Y\sim{\rm Geom}(p)$ with probability mass function

\mathbb{P}(Y=k)=(1-p)p^{k}~{}~{}~{}~{}~{}(k=0,1,\ldots).

Then $Y$ is a Geometric random variable. The mean and variance of $Y$ are $p/(1-p)$ and $p/(1-p)^{2}$ , respectively. Thus the variance is greater than the mean and the geometric distribution can be preferable to the Poisson distribution for data which is over-dispersed (shows more variability) than we would expect from a Poisson distribution.

We take into account explanatory variables (covariates) by using the following model:-

Y_{i}\sim{\rm Geom}(p_{i}),~{}~{}~{}~{}\log(p_{i}/(1-p_{i}))=\mathbf{x}_{i}^{% \prime}\beta~{}~{}~{}~{}~{}(i=1,2,\ldots,n).

(1)

Thus for $y=0,1,\ldots$ ,

\mathbb{P}(Y_{i}=y|\mathbf{x}_{i},\beta)=\left(\frac{1}{1+\exp(\mathbf{x}_{i}^% {\prime}\beta)}\right)\left(\frac{\exp(\mathbf{x}_{i}^{\prime}\beta)}{1+\exp(% \mathbf{x}_{i}^{\prime}\beta)}\right)^{y}.

Data

The data is provided in pupil.txt and contains the number of days absent of 314 pupils along with three covariates. The number of absence days are given in column 1 and the three remaining columns are:

•

Gender; 0 - female; 1 - male (column 2);
•

Maths test score; range 0-100 (column 3);
•

Programme pupil is on; 1, 2 or 3 (column 4).

We will analyse the absence data using the geometric regression model above with $\mathbf{x}_{i}^{\prime}=(1,g_{i},m_{i},1_{\{p_{i}=2\}},1_{\{p_{i}=3\}})$ including an intercept term and $\mbox{\boldmath$\beta$}^{\prime}=(\beta_{1},\beta_{2},\beta_{3},\beta_{4},% \beta_{5})$ , where $g_{i}$ , $m_{i}$ and $p_{i}$ denote the gender, maths score and programme, respectively, of pupil $i$ . Remember $1_{\{A\}}$ is an indicator random variable which is 1 if $A$ occurs and 0 otherwise.

The unknown parameters are $\beta\in\mathbb{R}^{5}$ . The objective here is to conduct posterior inference on these parameters.

In order to conduct Bayesian inference, we need to elicit a prior distribution for $\beta$ , the model parameters. Let us consider the following prior distribution

{\mbox{\boldmath$\beta$}}=\left(\begin{array}[]{c}\beta_{1}\\ \beta_{2}\\ \beta_{3}\\ \beta_{4}\\ \beta_{5}\end{array}\right)\sim N_{k}\left({\mbox{\boldmath$\mu$}_{0}},{\bf C}% _{0}={\rm Diag}\left({1\over{\kappa_{0}}_{1}},\ldots,{1\over{\kappa_{0}}_{k}}% \right)\right).

(2)

In other words, the prior for $\beta$ is $k$ -variate Normal with mean vector ${\mbox{\boldmath$\mu$}_{0}}\in\mathbb{R}$ and diagonal covariance matrix ${\bf C}_{0}$ . You may choose the following hyperparameter values ${\mbox{\boldmath$\mu$}_{0}}={\bf 0}\in\mathbb{R}^{5}$ (a $5$ -dimensional vector of 0 entries) and ${\kappa_{0}}_{1}=\ldots={\kappa_{0}}_{5}=0.01$ , which correspond to a fairly uninformative prior distributions.

1.

Write down, up to a constant of proportionality, the joint posterior distribution of $\beta$ . [2]
2.

Write a function in R to compute the log-likelihood given $\beta$ , $\mathbf{X}$ and $\mathbf{y}$ (the absence data). [3]
3.

Write a function in R to compute the log of the prior distribution. [1]
4.

Write in R a random walk Metropolis algorithm to obtain samples from the joint posterior distribution of $\beta$ . Use a multivariate Gaussian proposal for $\beta$ with variance matrix V.prop. [3]
Hint: Note that you can use the random walk Metropolis for the Poisson regression example in Lab 5 as a template.

Throughout the following apply the random walk Metropolis algorithm to the pupil data set with initial parameter values $\mbox{\boldmath$\beta$}=(0,0,0,0,0)$ and 10000 iterations of the algorithm.

6.

Perform a run with V.prop = diag(rep(0.01,5)) ( $0.01$ times the identity matrix). Comment upon the performance of the random walk Metropolis algorithm. [2]
7.

Use tuning runs to find a good choice of V.prop. State your final choice of V.prop. [2]
8.

Run the random walk Metropolis algorithm with your chosen V.prop for 110000 iterations and estimate the joint posterior distribution of the parameters.
Give a short report of the results obtained. [2]
9.

Using samples from the posterior distribution for $\beta$ , estimate the probability that a male on programme 3 with a maths score of 60 has no absences. [2]