1 Week 1- Bayesian inference.1.1 Introduction 1.3 Credibility and Confidence intervals

1.2 Conjugacy

Conjugacy

A prior is conjugate for a given likelihood if both the prior and posterior have the same parametric form.

We will see in the next chapter that all likelihoods from the exponential family have conjugate priors. Here we look at a few examples.

1.2.1 Beta-binomial conjugacy

The binomial

(Binomial sample.) Suppose our likelihood model is $x\sim\mbox{\rm Binomial\,$\left({n},{\theta}\right)$}$ , and we wish to make inferences about $\theta$ , from a single observation $x$ .

f(x|\theta)={n\choose x}\theta^{x}(1-\theta)^{n-x};~{}~{}

So, in this case, suppose we can represent our prior beliefs about $\theta$ by a beta distribution:

\theta\sim\mbox{\rm Beta\,$\left({p},{q}\right)$}

so that

	$\displaystyle\pi(\theta)$	$\displaystyle=$	$\displaystyle\frac{\Gamma(p+q)}{\Gamma(p)\Gamma(q)}\theta^{p-1}(1-\theta)^{q-1% }~{}~{}~{}~{}~{}~{}(0\leq\theta\leq 1)$
		$\displaystyle\propto$	$\displaystyle\theta^{p-1}(1-\theta)^{q-1}.$

The parameters of this distribution are $p>0$ an d $q>0$ . (They are NOT probabilities and may have any positive value.) The mean and variance of this distribution are

	$\displaystyle{\rm E}(\theta)$	$\displaystyle=m=\frac{p}{p+q}$
	$\displaystyle\mbox{and}~{}~{}{\rm Var}(\theta)$	$\displaystyle=v=\frac{pq}{(p+q)^{2}(p+q+1)}.$

The Beta distribution is written as

\displaystyle\pi(\theta)

\displaystyle=

\displaystyle\frac{\theta^{p-1}(1-\theta)^{q-1}}{\mbox{\rm B}(p,q)}~{}~{}\mbox% {where}~{}~{}\mbox{\rm B}(p,q)=\frac{\Gamma(p)\Gamma(q)}{\Gamma(p+q)}=\int_{0}% ^{1}\theta^{p-1}(1-\theta)^{q-1}\mbox{d}\theta.

We call $\mbox{\rm B}(p,q)$ the beta function; don’t confuse it with the distribution.

In Figure Figure 1.10 (Link) some simple cases of the beta distribution are shown.

Figure 1.10: Link, Caption: Use of the beta distribution to describe a variety of belief about a proportion.

Bayes Rule Now we apply Bayes Theorem using this prior distribution:

$\displaystyle\pi(\theta\mid x)$	$\displaystyle\propto$	$\displaystyle\pi(\theta)f(x\|\theta)$
	$\displaystyle\propto$	$\displaystyle\theta^{p-1}(1-\theta)^{q-1}\times\theta^{x}(1-\theta)^{n-x}$
	$\displaystyle=$	$\displaystyle\theta^{p+x-1}(1-\theta)^{q+n-x-1}$
	$\displaystyle=$	$\displaystyle\theta^{P-1}(1-\theta)^{Q-1}$

There is only one density function proportional to this, so it must be the case that

\theta|x\sim\mbox{\rm Beta\,$\left({P},{Q}\right)$}.

The updates are

	$\displaystyle P$	$\displaystyle\leftarrow p+x$
	$\displaystyle Q$	$\displaystyle\leftarrow q+n-x.$		(Updates for Beta prior with a Binomial likelihood)

In other words, the number of successes to added to the first parameter of the Beta and the number of failures to the second parameter. This does not have to be done all at once. It can be done observation by observation.

Sequential updating of belief in a parameter

Figure 1.11: Link, Caption: The diagram shows how the posterior can be calculated sequentially

If the data consists of a sequence of shots on goals by a player. They are denoted by $Y=1,1,0,1.$ Then our belief in the ability of the player can be updated sequentially.

\displaystyle\mbox{\rm Beta\,$\left({1},{1}\right)$}\xrightarrow[]{y=1}\mbox{% \rm Beta\,$\left({2},{1}\right)$}\xrightarrow[]{y=1}\mbox{\rm Beta\,$\left({3}% ,{1}\right)$}\xrightarrow[]{y=0},\mbox{\rm Beta\,$\left({3},{2}\right)$}% \xrightarrow[]{y=1}\mbox{\rm Beta\,$\left({4},{2}\right)$}

The expected sucess rates are

\displaystyle\hat{\pi}_{0}=\frac{1}{2},\hat{\pi}_{1}=\frac{2}{3}\,\hat{\pi}_{2% }=\frac{3}{4},\hat{\pi}_{3}=\frac{3}{5},\hat{\pi}_{4}=\frac{4}{6}

Sequential inference with the Binomial

Let us take an example of a run of successes and failures in a basketball game. For each success $p\rightarrow p+1$ . For each failure $q\rightarrow q+1$

y <- c(1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1,
    1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0)
ΨΨ#several rows ommitted
p <- 1 ;q <- 1
mean <- rep(0, 200); uq <- rep(0, 200); lq <- rep(0, 200)
for (i in 1:200) {
    p <- p + y[i]
    q <- q + 1 - y[i]
    mean[i] <- p/(p + q)
    lq[i] <- qbeta(0.025, p, q)
    uq[i] <- qbeta(0.975, p, q)
}

Sequential inference with the Binomial

Figure 1.12: Link, Caption: The panel on the left shows predicted probability calculated sequentially where all observations are considered equally with no drift or forgetting. The estimates in the second panel emphasize current form by forgetting observations from the distant past. This model is more responsive to changes

Forgetting

Bayesian learning and Bayesian forgetting

In stationary models (when $\theta$ is fixed) as more and more observations are made we become more and more certain about the parameter an intervals for the parameter ( $\theta$ ) become narrower.

In a model with forgetting, the parameter $\theta$ changes with time. A good example of this is when $\theta$ is the current form of a current sports team. As the team changes $\theta$ the form of the team changes. When $\theta$ changes recent results are judged more relevant than the results of games from the distant past.

Click this link for a good video on Bayesian inference on a beta distribution: https://www.coursera.org/learn/bayesian/lecture/xFRKb/inference-on-a-binomial-proportion ‘

1.2.2 Gamma-Poisson conjugacy

Examle: A Poisson sample.

Suppose we have a random sample (i.e. independent observations) of size $n$ , $x=(x_{1},x_{2},\ldots,x_{n})$ of a random variable $X$ whose distribution is Poisson $\left({\theta}\right)$ Then

	$\displaystyle f(x\|\theta)$	$\displaystyle=$	$\displaystyle\prod_{i=1}^{n}\frac{e^{-\theta}\theta^{x_{i}}}{x_{i}!}=L(\theta;x)$
		$\displaystyle\propto$	$\displaystyle e^{-n\theta}\theta^{\Sigma x_{i}}$

As in the binomial example, prior beliefs about $\theta$ will vary from problem to problem, but we’ll look for a form which gives a range of different possibilities, but is also mathematically tractable.

A conjugate Gamma prior.

case we suppose our prior beliefs can be represented by a gamma distribution:

\theta\sim\mbox{{\rm Gamma\,}$\left({p},{q}\right)$},

\pi(\theta)=\frac{q^{p}}{\Gamma(p)}\theta^{p-1}\exp\{-q\theta\}~{}~{}~{}~{}(% \theta>0).

The parameter $p>0$ is a shape parameter, and $q>0$ is a scale parameter. The mean and variance of this distribution are

\mbox{\rm E\,}(\theta)=m=\frac{p}{q}~{}~{}\mbox{and}~{}~{}\rm{Var}(\theta)=v=% \frac{p}{q^{2}}.

The Gamma distribution

Figure 1.13: Link, Caption: Gamma distributions with the same mean but different scale parameters. Notice the inverse relationship between variance of the data and the scale parameter of the Gamma distribution.

Updating a Gamma prior

Assume we have a Poisson likelihood with a gamma prior then by applying Bayes’ Theorem with this prior distribution we get,

$\displaystyle\pi(\theta\mid x)$	$\displaystyle\propto$	$\displaystyle\frac{q^{p}}{\Gamma(p)}\theta^{p-1}\exp\{-q\theta\}\times\exp\{-n% \theta\}\theta^{\Sigma x_{i}}$
	$\displaystyle\propto$	$\displaystyle\theta^{(p+\Sigma x_{i}-1)}\exp\{-(q+n)\theta\}$
	$\displaystyle=$	$\displaystyle\theta^{P-1}\exp(-Q\theta).$

Again, there is only one density function proportional to this, so it must be the case that

\theta|x\sim\mbox{{\rm Gamma\,}$\left({P},{Q}\right)$},

This is nother gamma distribution whose parameters are modified by the sum of the data, $\sum_{i=1}^{n}x_{i}$ , and the sample size $n$ . The updates are

	$\displaystyle P$	$\displaystyle\leftarrow p+\sum_{i=1}^{n}x_{i}$
	$\displaystyle Q$	$\displaystyle\leftarrow q+n.$		(Updates for Gamma prior with a Poisson likelihood)

A nice video example on Poisson-Gamma conjugacy: https://www.coursera.org/learn/bayesian-statistics/lecture/tR5ne/lesson-8-1-poisson-data

Sequential inference with the Poisson

Let us take an example of drug arrest from a particular patrol car and update our parameters sequentially: $p\rightarrow p+y$ and $q\rightarrow q+1$ .

y <- c(0, 1, 0, 0, 7, 0, 0, 0, 0, 0, 1, 3, 0, 4, 0, 1, 0, 2, 2, 0, 0, 0,
    0, 0, 0, 0, 0, 2, 0, 0, 0, 3, 1, 0, 0, 1, 0, 0, 0, 0, 2, 1, 0, 1, 0,
    0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 3, 0, 0, 0, 2, 0, 0, 4, 0, 0, 0, 0, 0,
    0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    3, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 2, 1, 0,
    0, 0, 1, 2, 1, 2, 11, 2, 4, 0, 2, 4, 7, 5, 12, 6, 4, 6, 0, 4, 9, 4,
    8, 7, 2, 9, 10, 16, 19, 2)
p <- 1; q <- 1;n <- length(y)
mean <- rep(0, n); luq <- rep(0, n);lq <- rep(0, n)
for (i in 1:n) {
    p <- p + y[i];   q <- q + 1;   mean[i] <- p/q
    lq[i] <- qgamma(0.025, p, q);  uq[i] <- qgamma(0.975, p, q)
}

Sequential inference by updating the Gamma

arson.png

The panel on the left shows drug related reports, The posteriors are calculated sequentially. Here all observations are considered equally. Note how the uncertainty expected number of drug arrests (the shaded region) goes down in time. In the second panel the past is downweighted and recent history is emphasised.

Football example

Let $i\in\{1,2,\ldots,20\}$ denote the home team and $j\in\{1,2,\ldots,20\}$ denote the away team and the games in chronological labelled as $t=1,\ldots,380$ .

	$\displaystyle x^{t}_{i,j}$	$\displaystyle\sim\mbox{{\rm Poisson\,}$\left({\mu_{i,j}}\right)$},$	$\displaystyle\mu_{i,j}=\alpha_{i}\beta_{j}\gamma$		(Likelihood for goals of home team )
	$\displaystyle y_{j,i}^{t}$	$\displaystyle\sim\mbox{{\rm Poisson\,}$\left({\lambda_{j,i}}\right)$},$	$\displaystyle\lambda_{j,i}=\alpha_{j}\beta_{i}.$		(Likelihood for goals of away team)

where $\alpha_{i}$ denotes the attacking strength of team $i$ , $\beta_{j}$ is the defensive strength of team $j$ and $\gamma$ is the common home ground advantage. The priors for the Poisson model are given below. $\delta$ is fixed at say $\delta=10$ .

$\displaystyle\alpha_{i}$	$\displaystyle\sim\mbox{{\rm Gamma\,}$\left({\delta},{\delta}\right)$}$	(Priors for attacking strengths)
$\displaystyle\beta_{j}$	$\displaystyle\sim\mbox{{\rm Gamma\,}$\left({\delta},{\delta}\right)$}$	(Priors for defensive strengths)
$\displaystyle\gamma$	$\displaystyle\sim\mbox{{\rm Gamma\,}$\left({\delta},{\delta}\right)$}$	(Priors for home ground advantage)

Football example

Let us say Liverpool is playing Arsenal at Home. The prior attacking and defensive strengths before the game are

$\displaystyle\alpha_{L}$	$\displaystyle\sim\mbox{{\rm Gamma\,}$\left({2},{1}\right)$}$	$\displaystyle\alpha_{A}\sim\mbox{{\rm Gamma\,}$\left({3},{2}\right)$}$
$\displaystyle\beta_{L}$	$\displaystyle\sim\mbox{{\rm Gamma\,}$\left({1},{3}\right)$}$	$\displaystyle\beta_{A}\sim\mbox{{\rm Gamma\,}$\left({3},{4}\right)$}$
$\displaystyle\gamma$	$\displaystyle\sim\mbox{{\rm Gamma\,}$\left({3},{2}\right)$}$

Let us say Liverpool win 4-1.

1.

Find the expected attacking strength, defensive strengths and HGA before the game.
2.

Find the prior expected score.
3.

Write out the likelihood for the scores of the home and away teams.
4.

Find the posterior distributions of all five parameters.
5.

Update the Gamma parameters for their attacking and defensive strengths, and HGA after game.
6.

Which teams improved their attacking strength and defensive strengths?

Attacking strength 2017-18

fotball20018A.png

Attacking strength shown through the 2017-18 football league season. All goals are modelled as Poisson. All teams start with the same prior (ie the same ability). The shaded region shows the region between the upper and lower quartiles.

Defensive strength 2017-18

Figure 1.14: Link, Caption: Defensive strength shown through season 2017-18. Premier football league

1.2.3 Gaussian Gaussian conjugacy

Inference for the mean of normally distributed data

Let $y=(y_{1},y_{2},\ldots,y_{n})$ be a random sample of size $n$ of a random variable $Y$ with the Normal $\left({\mu},{\frac{1}{\tau}}\right)$ distribution, where $\tau=\frac{1}{\sigma^{2}}$ is assumed known. The likelihood is better desribed using the precision $\tau$ .

	$\displaystyle f(y\|\mu,\sigma)$	$\displaystyle=\frac{1}{\sqrt{2\pi}\sigma}\exp\left\{-\frac{(y-\mu)^{2}}{2% \sigma^{2}}\right\}$
		$\displaystyle=\frac{\tau^{\frac{1}{2}}}{\sqrt{2\pi}}\exp\left\{-\frac{\tau}{2}% (y-\mu)^{2}\right\}$
		$\displaystyle\propto\tau^{\frac{1}{2}}\exp\left\{-\frac{\tau}{2}(y-\mu)^{2}\right\}$

Useful result

The quadratic form of the Gaussian

The following is a useful result for identifying normal distributions. If $\mu$ is a parameter with probability density function $f(\mu)$ satisfying

f(\mu)\propto\exp\left(-\frac{1}{2}\{A\mu^{2}-2B\mu\}\right),

iff $\mu\sim\mbox{{\rm Normal\,}$\left({B/A},{1/A}\right)$}.$
PROOF $(\impliedby)$

	$\displaystyle\mu$	$\displaystyle\sim\mbox{{\rm Normal\,}$\left({B/A},{1/A}\right)$}$
	$\displaystyle\implies-2\log(f(\mu))$	$\displaystyle=A\left(\mu-\frac{B}{A}\right)^{2}$
		$\displaystyle=A\mu^{2}-2B\mu+\frac{B^{2}}{A}$
	$\displaystyle\implies f(\mu)$	$\displaystyle\propto\exp\left(-\frac{1}{2}\{A\mu^{2}-2B\mu\}\right)$

Gaussian likelihood and prior

We now pair up the likelihood with a prior for $\mu$ .

	$\displaystyle Y_{i}$	$\displaystyle\sim\mbox{{\rm Normal\,}$\left({\mu},{\frac{1}{\tau}}\right)$},\;% \;i=1,2,\ldots,n$		(The likelihood)
	$\displaystyle\mu$	$\displaystyle\sim\mbox{{\rm Normal\,}$\left({\mu_{0}},{\frac{1}{\tau_{0}}}% \right)$}$		(The prior)

We now show that the conjugate prior for the mean of the normal using the result above

Gaussian likelihood and prior

$\displaystyle\pi(\mu\|\textbf{y})$	$\displaystyle\propto L(\mu;y_{i})\pi(\mu)$
	$\displaystyle\propto\exp\left(-\frac{\tau}{2}\sum_{i=1}^{n}(y_{i}-{\mu})^{2}% \right)\exp\left(-\frac{\tau_{0}}{2}({\mu}-\mu_{0})^{2}\right)$
	$\displaystyle\propto\exp\left(-\frac{\tau}{2}({n\mu^{2}}-2{\mu}\sum_{i=1}^{n}y% _{i})\right)\exp\left(-\frac{\tau_{0}}{2}({\mu^{2}}-2{\mu}\mu_{0})\right)$
	$\displaystyle\propto\exp\left(-\frac{n\tau+\tau_{0}}{2}{\mu^{2}}-{\mu}(n\tau% \bar{y}+\tau_{0}\mu_{0})\right)$
	$\displaystyle\propto\exp\;-\frac{{n\tau+\tau_{0}}}{2}\left(\mu-{\frac{\tau n{% \bar{y}}+\tau_{0}\mu_{0}}{n\tau+\tau_{0}}}\right)^{2}$
$\displaystyle\implies\mu\mid y,\tau$	$\displaystyle\sim\mbox{{\rm Normal\,}$\left({\frac{\tau n{\bar{y}}+\tau_{0}\mu% _{0}}{n\tau+\tau_{0}}},{\frac{1}{n\tau+\tau_{0}}}\right)$}$	(1.1)

Bayes updating of $\mu$ , the mean

Prior precision = $\tau_{0}$ .
Sample precision = $n\tau$ .
Posterior Precision $\tau_{p}\leftarrow\tau_{0}+n\tau$ .
Posterior precision = prior precision + sample precision
Prior mean = $\mu_{0}$ .
Sample mean = $\bar{y}$ .
Posterior mean $\mu_{p}\leftarrow\frac{n\tau{\bar{y}}+\tau_{0}\mu_{0}}{n\tau+\tau_{0}}$
= $\gamma_{0}\mu_{0}+\gamma_{s}\bar{y}$ .
$\gamma_{0}=\frac{\tau_{0}}{n\tau+\tau_{0}}=\frac{\frac{\tau_{0}}{\tau}}{n+% \frac{\tau_{0}}{\tau}}$
$\gamma_{s}=\frac{n\tau}{n\tau+\tau_{0}}=\frac{n}{n+\frac{\tau_{0}}{\tau}}$
Posterior mean = weighted sum of the prior mean and the sample mean
ESS of a Gaussian prior with respect to a Gaussian sample $\frac{\tau_{0}}{\tau}$ .

Observations to be made A number of observations can be made:

(1)

Note that the effective sample size $n_{0}=\frac{\tau_{0}}{\tau}$ or $\tau_{0}=n_{0}\tau$ .
(2)

Observe that ‘posterior precision’ = ‘prior precision’ + $n\times$ ‘precision of each data item’.
(3)

As $n\rightarrow\infty$ , then (loosely)

$\mu|y\sim\mbox{{\rm Normal\,}$\left({\overline{y}},{\frac{\sigma^{2}}{n}}% \right)$}$

so that the prior has no effect in the limit.
(4)

As uncertainty contained in the prior increases or $\sigma^{2}_{0}\rightarrow\infty$ , or equivalently the prior precision decreases or $\tau_{0}\rightarrow 0$ , we again obtain

$\mu|y\sim\mbox{{\rm Normal\,}$\left({\overline{y}},{\frac{\sigma^{2}}{n}}% \right)$}$
(5)

Note that the posterior distribution depends on the data only through $\overline{y}$ and not through the individual values of the $y_{i}$ themselves. Again, we say that $\overline{y}$ is sufficient for $\mu$ .

Sequential inference when updating a mean

Let us take an example data set :Yearly suicides in Australia per 100000 individuals. We update the mean one observation at a time. Bayes theorem gets applied at each time step $i=1,2,\dots$ using the previous mean, $\mu_{i-1}$ , in the prior for the current mean $\mu_{i}$ .

$\displaystyle\mu_{i}$	$\displaystyle\sim\mbox{{\rm Normal\,}$\left({\mu_{i-1}},{\frac{1}{\tau_{i-1}}}% \right)$}$	(The Prior )
$\displaystyle y_{i}$	$\displaystyle\sim\mbox{{\rm Normal\,}$\left({\mu_{i}},{\frac{1}{\tau}}\right)$}$	(The Likelihood )
$\displaystyle\mu_{i}\mid y_{i}$	$\displaystyle\sim\mbox{{\rm Normal\,}$\left({\frac{\mu_{i-1}\tau_{i-1}+y_{i}% \tau}{\tau_{i-1}+\tau}},{\frac{1}{\tau_{i-1}+\tau}}\right)$}$	(The Posterior)
	$\displaystyle\sim\mbox{{\rm Normal\,}$\left({\mu_{i-1}+\underbrace{\left(\frac% {\tau}{\tau+\tau_{i-1}}\right)}_{\rm Gain}\;\underbrace{(y_{i}-\mu_{i-1})}_{% \rm error}},{\frac{1}{\tau+\tau_{i-1}}}\right)$}$

1.2.4 Gaussian Gamma conjugacy

The normal distribution, mean assumed known

Let

	$\displaystyle Y\mid\tau$	$\displaystyle\sim\mbox{{\rm Normal\,}$\left({\mu},{\frac{1}{\tau}}\right)$}$
	$\displaystyle f(y\mid\tau)$	$\displaystyle\propto\tau^{\frac{n}{2}}e^{-\sum_{i=1}^{N}(y_{i}-\mu)^{2}\tau/2}$

Since $\mu$ is fixed then $S=\sum_{i=1}^{N}(y_{i}-\mu)^{2}$ will also be fixed. Thus the prior is $\tau\sim\mbox{{\rm Gamma\,}$\left({p},{q}\right)$}$ .

$\displaystyle\pi(\tau)$	$\displaystyle\propto\tau^{p-1}e^{-q\tau}.$	(The prior)
$\displaystyle f(y\|\tau)$	$\displaystyle\propto\tau^{\frac{n}{2}}e^{-S\tau/2}.$	(The likelihood)
$\displaystyle\pi(\tau\|y,\mu)$	$\displaystyle\propto\pi(\tau)f(y\|\tau)$
	$\displaystyle\propto\tau^{p+\frac{n}{2}-1}e^{-\tau\left(\frac{S}{2}+q\right)}$
$\displaystyle\implies\tau\mid y,\mu$	$\displaystyle\sim\mbox{{\rm Gamma\,}$\left({\frac{n}{2}+p},{\frac{\sum_{i=1}^{% n}(y_{i}-\mu)^{2}}{2}+q}\right)$}.$	(The posterior)

The normal distribution (mean, $\mu$ , known) So the updates for the parameters of the Gamma distribution are

	$\displaystyle P$	$\displaystyle\leftarrow p+\frac{n}{2}$
	$\displaystyle Q$	$\displaystyle\leftarrow q+\frac{\sum_{i=1}^{n}(y_{i}-\mu)^{2}}{2}$		(Updates for Gamma prior with a normal likelihood)

1.2.5 Gamma-Laplacian conjugacy

The Laplacian distribution (mean, $\mu$ , known)

The Laplacian distribution is useful for modeling distributions with heavy tails such as those found in stock-market returns. We model the returns as

y_{i}\mid\mu,\tau\sim\mbox{{\rm Laplace\,}$\left({\mu},{\frac{1}{\tau}}\right)$}

where has a density given by

f(y_{i}|\tau,\mu)=\tau\exp(-\tau\left|y_{i}-\mu\right|),\;i=1,2,\ldots,n.

where $\tau$ is not the precision. The Laplacian has a variance given by $\mbox{\rm Var\,}(y_{i}\mid\mu,\tau)=2\tau^{-2}$ . In volatility modelling we are interested in how the variance of observations change (not the mean). We set the mean of the Laplacian to zero.

f(y|\tau)\propto\tau^{n}e^{-\tau\sum_{i=1}^{n}|y_{i}|}.

When the prior is given as $\tau\sim\mbox{{\rm Gamma\,}$\left({p},{q}\right)$}$ , The posterior becomes

	$\displaystyle\pi(\tau\|y)$	$\displaystyle\propto\pi(\tau)f(y\|\tau)$
	$\displaystyle\implies\tau\mid y$	$\displaystyle\sim\mbox{{\rm Gamma\,}$\left({n+p},{\sum_{i=1}^{n}\|y_{i}\|+q}% \right)$}.$

The Laplacian distribution

So the updates for the Gamma distribution are

	$\displaystyle P$	$\displaystyle\leftarrow p+n$
	$\displaystyle Q$	$\displaystyle\leftarrow q+\sum_{i=1}^{n}\|y_{i}\|.$		(Updates for Gamma prior with a Laplacian likelihood)

Two stockmarket crashes

The 2008 and 2010 stock market crashes and how they unfolded

Lehman Brothers filed for bankruptcy on 15 September 2008, prompting a fall in the FTSE 100 of 4%. It was the beginning of a slump that by Christmas of that year had resulted in 23.4% being wiped off the value of Britain’s top 100 companies.

In a matter of minutes (May 2010) the Dow Jones index lost almost 9% of its value in a sequences of events that quickly became known as ”flash crash”. Hundreds of billions of dollars were wiped off the share prices of household name companies. But the carnage, which took place at a speed never before witnessed, did not last long. The market rapidly regained its composure and eventually closed 3% lower. In just 20 minutes, 2 Bn shares worth $56 Bn had changed hands.

The dataset

A dataset that illustrates these shocks is shown here

ISE100	Istanbul stock exchange national 100 index
SP	Standard & poor’s 500 return index
DAX	Stock market return index of Germany
FTSE	Stock market return index of UK
NIK	Stock market return index of Japan
BVSP	Stock market return index of Brazil
EU	MSCI European index
EM	MSCI emerging markets index

Figure 1.15: Link, Caption: Returns from the world wide stock market data.

Sequential estimates of the volatility

Figure 1.16: Link, Caption: Sequential estimate of current standard deviation

\hat{\sigma}_{i}=\frac{\sqrt{2}}{\tau_{i}}

for each stock

Gamma or exponential likelihood with a gamma prior

Let $X_{1},\ldots X_{n}$ be independent variables having the Gamma $\left({k},{\theta}\right)$ distribution, where $k$ is known.

L(\theta;x)\propto\theta^{nk}\exp\{-\theta\Sigma x_{i}\}.

Now, studying this form, regarded as a function of $\theta$ suggests we could take a prior of the form

\pi(\theta)\propto\theta^{p-1}\exp\{-q\theta\}

that is, $\theta\sim\mbox{{\rm Gamma\,}$\left({p},{q}\right)$}$ , since then by Bayes’ Theorem

\pi(\theta\mid x)\propto\theta^{p+nk-1}\exp\{-(q+\Sigma x_{i})\theta\},

and so $\theta|x\sim\mbox{{\rm Gamma\,}$\left({p+nk},{q+\sum x_{i}}\right)$}$ . So the updates for the Gamma distribution are

	$\displaystyle P$	$\displaystyle\leftarrow p+nk$
	$\displaystyle Q$	$\displaystyle\leftarrow q+\sum x_{i}$		(Updates for Gamma prior with a Gamma likelihood)

	$\displaystyle\pi(\tau\|y)$	$\displaystyle\propto\pi(\tau)f(y\|\tau)$
	$\displaystyle\implies\tau\mid y$	$\displaystyle\sim\mbox{{\rm Gamma\,}$\left({n+p},{\sum_{i=1}^{n}\|y_{i}\|+q}% \right)$}.$

1.2 Conjugacy

Conjugacy

1.2.1 Beta-binomial conjugacy

The binomial

Sequential inference with the Binomial

Sequential inference with the Binomial

Forgetting

Bayesian learning and Bayesian forgetting

1.2.2 Gamma-Poisson conjugacy

Examle: A Poisson sample.

A conjugate Gamma prior.

The Gamma distribution

Updating a Gamma prior

Sequential inference with the Poisson

Sequential inference by updating the Gamma

arson.png

Football example

Football example

Attacking strength 2017-18

fotball20018A.png

Defensive strength 2017-18

1.2.3 Gaussian Gaussian conjugacy

Useful result

The quadratic form of the Gaussian

Gaussian likelihood and prior

Gaussian likelihood and prior

Bayes updating of μ, the mean

Sequential inference when updating a mean

1.2.4 Gaussian Gamma conjugacy

The normal distribution, mean assumed known

1.2.5 Gamma-Laplacian conjugacy

The Laplacian distribution (mean, μ, known)

The Laplacian distribution

Two stockmarket crashes

The 2008 and 2010 stock market crashes and how they unfolded

The dataset

Sequential estimates of the volatility

Gamma or exponential likelihood with a gamma prior

Bayes updating of $\mu$ , the mean

The Laplacian distribution (mean, $\mu$ , known)