Chapter 3 Week 3: Bayesian statistics: Prediction

The classical and the Bayesian approach to prediction

One of the primary purposes of statistics is to predict future values. The validity of a model can be assessed by seeing how accurately the model predicts. Bayesian statistics has a principled way of doing this.

The essential point is that there are two sources of uncertainty (dispersion) in predictions

  • Uncertainty in the parameter values; (Expressed by the posterior)

  • Uncertainty due to the fact that any future value is itself a random event (Sampling uncertainty expressed through the likelihood).

In classical statistics it is usual to fit a model to data, and then make predictions of future values on the assumption that this model estimate is known exactly, the so–called estimative approach. That is, only sample uncertainty is needed to explain the uncertainty of a a prediction. There is no completely satisfactory way around this problem in the classical framework since parameters are not thought of as being random.

Example. The predictive in football

Ted is the football team’s new goalkeeper. He predicts that he can save any number of goal attempts. To test him out on his claim some of the team (n=5 players) test his ability at a staged penalty shoot out. Ted saves 5 goals in a row.

Let π be the probability of Ted making a save. The statistical question is the testing of Ted’s assertion that he has a 100% chance of making a save or p(π)=1.

The classical estimate of π is π^=xin where xi=1 denotes a save and xi=0 a goal. The standard error of π^ can be estimated by the expression π^(1-π^)n.

Example

  1. (a)

    Why could the classical analysis could lead to a questionable prediction about Ted saving the next goal? [2]

  2. (b)

    Calculate the posterior distribution using Bayes theorem, the likelihood and a flat prior p(π)=1. Sketch the posterior. [ Hint: make sure the posterior distribution integrates to 1]

  3. (c)

    What is the predictive probability that Ted will save the next goal? Why is this more sensible than the result obtained in (a)?

Bayesian predictive distribution

Figure 3.1: Link, Caption: With a DAG, knowns are depicted with squares and unknowns with circles. On the left both the parameter and the future observation are both treated as unknown. In the estimative, on the right, the parameter is treated as known and only the future observation is treated as unknown. The estimative under-estimates the uncertainty of a future observation

Bayesian predictive distribution

Within Bayesian inference it is straightforward to allow for both sources of uncertainty by simply averaging over the uncertainty in the parameter estimates, the information of which is completely contained in the posterior distribution.

The predictive

So, suppose we have past observations x=(y1,,yn) of a variable with density function (or likelihood) f(y|θ) and we wish to make inferences about the distribution of a future value of a random variable Y* from this same model. With a prior distribution π(θ), Bayes’ theorem leads to a posterior distribution π(θy). Then the predictive density function of y* given y is:

f(y*|y)=f(y*|θ)π(θy)𝑑θ. (3.1)