1 Modelling and Statistical Inference

Starting Point for Inference

We typically start with a subject-matter question. Data, x, are then obtained to address the question. The data may be obtained in a very controlled way through a designed experiment, or may already be available.

Once the data are obtained, the data may be looked at through exploratory analysis, which involves graphical and numerical summaries of the data (means, medians, boxplots, histograms, …).

Then, the formal analysis, or inference begins. This inference is the focus of this course.

The subject-matter question usually concerns the system from which the data are derived, rather than the actual data. For example, the data may be a sample from a population of interest.

We must therefore acknowledge that the data are subject to random variation.

This can be thought of in (at least) two ways:

  • we could consider that if we drew another sample from the same population we would get different data

  • second we could see the data as an incomplete representation of the system we are attempting to describe.

In order to account for the random variation or uncertainty, we (provisionally) assume the data x derives from some distribution, or model, f(x|θ). The choice of which distribution may be based on subject-matter knowledge and/or the exploratory analysis.

We then wish to learn, or infer, about the parameter(s) θ. This is the main topic of this course.

We also need to check whether we made a good choice of distribution f, and perhaps try other distributions. Formal procedures for this model choice problem will be considered in this course.

Finally, we relate our findings about θ back to the subject-matter question.

Example 1.1:  BMI over time in England.

This example is to illustrate the general process and is based on a project the Lancaster Statistics Department was involved with during 2011-12. Body Mass Index (BMI) is defined as

BMI=Mass in kg(Height in m)2.

BMI provides an indication of whether your weight is too high (or too low) for your height. Normal BMI is between 18 and 25. BMI between 25–30 is classed as overweight; BMI over 30 is classed as obese.

Data were available from the Health Survey for England — an annual survey that samples the population of England and measures (amongst many other things) BMI.

The subject-matter question that we tried to answer was: What has driven the increase in BMI over the period 1992-2009?

After carrying out various checks and exploratory analysis, we did some statistical inference on the data. This involved fitting (quite complex) models to the BMI data over 1992-2009, for males and females, and for different ages.

Our main finding was that increases in BMI over the period were driven by a group of people (about 1/3 of the population) who were generally already overweight, and their BMI was increasing year on year, which could be summarised as ‘the fat get fatter’.

This finding will (hopefully) allow policy makers to target healthy eating and exercise programs at the right people to control the obesity epidemic.