5 Week 5 Bayesian statistics: Decisions

5.3 Decision theory for pomt estimation

Bayes rule for squared error loss

If the loss is squared error, the Bayes decision d is found by minimizing ρ(d,π(θ|X)) or simplifying the notation, ρ(d).

ρ(d,π) = θ|X(d-θ)2 (5.3)
= d2-2θ|X(θ)d+θ|Xθ2

Differentiating wrt d to find the minimum loss

ρ(d,π) = 2d-2θ|X(θ)=0
d = θ|X(θ)
= θθπ(θ|X)𝑑θ

Bayes rule for squared error loss

And Since ρ′′(d)=2, the posterior mean d=θ|X(θ), is the Bayes decision or rule.
Bayes risk can be found by substituting d=θ|X(θ) in (5.3) to get

ρ(π)=θ|X(θ2)-[θ|X(θ)]2=var θ|X

Characteristics of squared error loss

  • It is symmetrical

  • It is easy to interpret. This means that, with such a loss, we can summarize a posterior with a mean (Bayes rule) and variance (Bayes risk).

  • Other losses also have a Bayes rule of the posterior mean (Lindley 1985)

  • The squared loss is often criticized for penalising large errors to heavily

Decision theory for pomt estimation

The Bayes decision d is found by minimizing ρ(d)

ρ(d) = θ|X[w(θ)(d-θ)2]
= θ|X[w(θ)d2]+θ|X[-2w(θ)θd]+θ|X[w(θ)θ2]

Characteristics of squared error loss

Differentiating wrt d to find the minimum loss

ρ(d) = 2θ|X[w(θ)d]+θ|X[-2w(θ)θ]
d = θ|X[w(θ)(θ)]θ|X[w(θ)]
= θw(θ)θπ(θ|X)𝑑θθw(θ)π(θ|X)𝑑θ

The Bayes decision or rule is the mean of the weighted posterior over the posterior mean of the weights

Asymmetrical loss functions

Recall this loss function is given by

L(d,θ) = {K1(θ-d)d<θK2(d-θ)dθ

Bayes decision for asymmetrical loss

The Bayes decision can be found by minimising

ρ(d) = θ|XL(d,θ)
= θπ(θ|X)L(d,θ)𝑑θ
= d<θπ(θ|X)L(d,θ)𝑑θ+dθπ(θ|X)L(d,θ)𝑑θ
= K1d<θπ(θ|X)(θ-d)𝑑θ+K2dθπ(θ|X)(d-θ)𝑑θ
ρ(d) = -K1d<θπ(θ|X)𝑑θ+K2dθπ(θ|X)𝑑θ=0
0 = -K1Pθ|X(d<θ)+K2(1-Pθ|X(d<θ))
Pθ|X(d<θ) = K2K1+K2

Bayes decision for asymmetrical loss

Pθ|X(d<θ))=K2K1+K2

The Bayes decision d is the K2K1+K2 fractile (quantile) of the posterior.

In particular for absolute loss K1=K2 the Bayes decision is the median.

Examples of asymmetrical loss

It is usual that given positive error may be more serious than a given negative error of the same magnitude or vice-versa. Examples include the following

  1. 1.

    The plug is pulled out of a ventilator of a very sick hospital patient when the probability that a patient is dead exceeds a threshold say P(D=1)>λ.

  2. 2.

    A nuclear power plant is to be shut down if the probability of a meltdown is greater than a threshold

  3. 3.

    The safe concentration of CO2 in the atmosphere is thought to exceed a threshold, [C02]>ϵ. When this level is exceeded, the risk of a runaway greenhouse gas effect is thought too high and expensive correctional procedures are carried out.

Binary loss

An interval of length 2ϵ; say (b-ϵ;b+ϵ) is said to be a modal interval of length 2ϵ for the distribution of a random variable X if
P(b-ϵ<X<b+ϵ) takes on its maximum value out of all such intervals.

For the loss function

Ł(d,θ) = {0|d-θ|<ϵ1|d-θ|>ϵ

P`(d-θ<ϵ<d+θ) is maximized if d is chosen to be the midpoint of the modal interval of length 2ϵ.

Binary loss

For the limiting case of this as ϵ0 is the hit or miss loss:

L(d,θ)=δ(d,θ).

where δ is the Kronecker function. If the posterior distribution is uni-modal the Bayes decision is

d=𝚊𝚛𝚐𝚖𝚊𝚡θπ(θ|X)

the mode of the posterior (The MAP).

Characteristics of 0-1 loss

  • Loss function can be thought of as depicting the truth of a model. When a model is either right or wrong this is the appropriate loss function.

  • This is mainly used in the classical formulation of hypothesis testing as formalized by Newman and Pearson.

  • It does not take into account shades of usefulness.