A Jump Back to Population Estimation
10th April 2016
The next project the MRes students in STOR-i are onto is researching and presenting aspects of one of the Masterclasses that
have occurred this term (excluding the one on simulation that I spoke about in Optimisation
and Error in Simul?ation). Emily Graham and I are looking into the Masterclass by
Professor Ruth King of School of Mathematics at the University
of Edinburgh, who came to talk about ways of estimating population sizes using incomplete contingency tables. I have already
spoken about this in my blog post Estimating Near Invisible Populations. In that post, I discussed
the basic framework of the problem. In this post, I wish to talk about some of this in more detail, particularly the
computational side.
As King used a Bayesian statistics
approach to the problem, the aim is to get to the posterior distribution, $\pi(n_0,\theta|\mathbf{x})$, for the log-linear parameters
and the number of unobserved individuals given the data, $\mathbf{x}$, we have. This is often very difficult to sample from as it is
not a standard probability distribution. One way of going about this is to use Markov Chain Monte Carlo (MCMC), which is spoke briefly
about in Can MCMC Be Updated For The Age Of Big Data?. For MCMC, initial values are given to each of the
parameters, then the values are updated by simulating from either conditional distributions (if they are in a nice form) for from
distributions that approximate the posterior and accepting them with a certain probability. . Due to the property of detailed balance
(see later) the Markov chain tends to a stationary distribution equal to the posterior. Therefore, provided it has enough time to run
and the early stages are removed, an MCMC sample will be a good approximate sample from the posterior.
There is a problem that occurs in statistics which is involved in how to pick which model, $m$, to use. In the application to population
estimation, the different models correspond to different choices of main effect and interaction log-linear terms to include. The
different models gave quite a wide range of estimates for the size of the population, so which one is correct? Really, we don’t know
but just choosing one model and quoting its results will not give use the fact that we are uncertain about our choice. One way to
include this uncertainty is model averaging.
For model averaging, the model itself is chosen to be a discrete parameter, $m\in\mathcal{M}$, and is added to the posterior. The
parameters used for a particular model are $\theta_m$. This means that the posterior has form
$$\pi(n_0,\theta,m|\mathbf{x})\propto f(\mathbf{x}| n_0,\theta_m,m) \pi(n_0|\theta_m,m)\pi(\theta_m|m)\pi(m).$$
$f(\mathbf{x}| n_0,\theta_m,m) $ is the likelihood of the data, and the other terms are prior distributions that incorporate current
beliefs about the other parameters. This makes MCMC really hard. You can’t just add in parameters easily between steps in a chain!
Fortunately, a clever way of doing this has been suggested by Professor Peter Green
of the University of Bristol, Reversible Jump MCMC.
Let the chain be at the parameter vector $\theta$. For now, just consider the move to a model, $m’$, with more parameters than the model,
$m$, the chain is currently in. A move of type $j$ to model $m’$ with probability $r(j,m,\theta)$. How do we then choose what values
for the new parameter vector $\theta’$? Well, Green’s suggestion was that the parameters that stayed in the model transformed
deterministically by a function $g$ (that could just be the identity function) and the additional parameters were proposed from a
distribution $q(u)$. Therefore, we would have $\theta’=(g(\theta),u)$.
Now that we have proposed a move, we need some way of deciding whether or not to accept it. This comes naturally out of assumptions of
detailed balance. Detailed balance basically means that the probability of going from state $i$ to state $j$ is the same as going from state
$j$ to state $i$. To satisfy this assumption, Green showed that an acceptance probability is $\alpha(\theta,\theta’)=\min\{1,A\}$ where
$$A = \frac{\pi(m’,\theta’|\mathbf{x}) r(j,m’,\theta’)}{ \pi(m,\theta|\mathbf{x}) r(j,m,\theta)q(u)}\left|\frac{\partial\theta’}{\partial(\theta,u)}\right|.$$
This is basically the ratio of the probability of proposing going to $\theta$ from $\theta’$ to the probability of proposing going to
$\theta’$ from $\theta$. The acceptance rate of the reverse move, that is to a model with fewer parameters is therefore
$\alpha(\theta’,\theta)=\min\{1,1/A\}$.
I found this method of model averaging by Reversible Jump MCMC quite clever, and it seems to be a good way to help incorporate model
uncertainty into ones answers. This is something that I haven’t seen done in classical statistics. The assumptions of selecting a model
are quite strong ones and so to reduce the strength of them is very important to me.
References
[1] On the Bayesian Analysis of Population Size ,
R. King and S. P. Brooks, Biometrika. Vol. 88, No. 2, pp. 317-336 (2001)
[2] Reversible jump Markov
chain Monte Carlo computation and Bayesian model determination, P. J. Green, Biometrika, Vol.82, No.4, pp. 711-732 (1995)