Finding a numerical solution

Continuing with the phycologist’s study, the children were instructed to ”make a mark at the mid-point of the line”. It is assumed that the distribution of the marks follow a Beta(α, α) distribution with pdf:

fX(x)=xα-1(1-x)α-1B(α,α)

where B(α,α)=01tα-1(1-t)α-1𝑑t denotes the beta function. The measurements for this question are presented below:

0.40 0.60 0.34 0.03 0.79
0.09 0.38 0.32 0.84 0.14

Show that this distribution also belongs to the exponential family with canonical parameter θ=α-1, sufficient statistic t(x)=log[x(1-x)] and function κ(θ)=logB(θ+1,θ+1).

Finding the MLE for the canonical parameter in this example cannot be solved analytically due to the beta function. A numerical approach must be taken. The first step is to define an expression for the derivative of κ(θ). For some small ϵ>0, the derivative can be approximated by:

κθ(θ)κ(θ+ϵ)-κ(θ-ϵ)2ϵ.

This is done in R by the function:

deriv_kappa <- function(theta, eps){
  part1 <- log(beta(theta + 1 + eps, theta + 1 + eps))
  part2 <- log(beta(theta + 1 - eps, theta + 1 - eps))
  return( (part1 - part2) / (2 * eps) )
}

Using this, the score function for this example can then be defined in R as follows:

score_fn <- function(x, theta, eps){
  stx <- sum( log(x * (1 - x)) )
  n <- length(x)
  dk <- deriv_kappa(theta, eps)
  return(stx - n * dk)
}

The MLE is defined as the root of the score function. This can be evaluated using the uniroot command that requires a function f and an interval which contains a single root of the function f. Any information required by f must also be provided. For the mid-point question, the MLE for the canonical parameter is estimated by:

x <- c(0.4,0.6,0.34,0.03,0.79,0.09,0.38,0.32,0.84,0.14)
MLE <- uniroot(f=score_fn, interval=c(-0.9,2), x = x, eps = 0.001)
MLE ## 0.1145062

Check that the MLE is not effected by the approximation to the derivative by seeing if the MLE changes when eps is reduced. For more information on using the uniroot command, type ?uniroot to open its help page.

To estimate the expected information at the MLE, we can adopt the same approach to approximate derivative of κθ(θ):

eps2 <- 0.001
pt1 <- deriv_kappa(MLE$root + eps2, 0.001)
pt2 <- deriv_kappa(MLE$root - eps2, 0.001)
I <- length(x) * (pt1 - pt2) / (2 * eps2)
I  ## 5.584224
MLE$root + qnorm(c(0.025, 0.975)) / sqrt(I) ##-0.7148981  0.9439121

The 95% confidence interval contains θ=0 and so there is insufficient evidence to reject the hypothesis that the marks are uniformly distribution along the line at the 5% significance level. It can therefore be argued that the children do not understand the second instruction.