MATH330 – Likelihood Inference

Chapter 6 Model Choice

Motto “Essentially, all models are wrong, but some are useful.”

So far in the module we have generally made the assumption that, given data x=x1,,xn, these data are known to have arisen from a distribution f(,θ), where the only unknown is the value of θ. We then used likelihood methods to find estimates of these values, and calculated the standard errors of these estimates (typically relying on asymptotic theory).

This chapter focuses on the equally important issue: given data x, how do we choose an appropriate model f?

This chapter will have a slightly different flavour as, while there are mathematical tools available to aid in model choice, there are also philosophical and common sense components involved. We will proceed mainly by considering real data examples, interspersed with occasional theory.

Simple vs. Complex models: overfitting

An important point here is that a more complex model always fits better than a simpler one, and will thus have a higher likelihood. This can be explained through a linear regression example.

Suppose yi=β0+β1xi+ϵi, where ϵiN(0,σ2). Then the ‘true’ model we wish to fit is a regression of y on x. However, we could fit regressions with any polynomial terms of x we like.

Figure LABEL:fig:overfit demonstrates this: the black line is the ‘true’ model, the red line is the best fitting fifth-order polynomial. The red line is a ‘better’ fit in the sense that it is closer to the points, but it is overfitting the data — modelling chance relationships in the ϵs rather than the effects we are actually interested in.

Figure 6.1: Link, Caption: A visualisation of what it means to overfit

There are many downsides to overfitting.

  • Overfitted models are poor at prediction. For example, if we observe x=2, then under the overfitted model of Figure LABEL:fig:overfit we would predict y=4.5 rather than the ‘true’ model which predicts y=3.9.

  • Overfitted models can be hard to interpret. The red line in Figure Figure 6.1 (Link) corresponds to a model with six parameters while the true model contained only two parameters. In statistics a common goal is to seek parsimonious models, i.e. the simplest model that makes an adequate description of our data. Of course, what we mean by ‘adequate’ is open to discussion, which is what makes model selection difficult.