4 Bayesian statistics 331 Week 4 Multi-parameter models

4.1 The Multinomial dirichlet family

An example of Multinomial data

DNA bases

atgagcgccgttcccgatatccccggcggccccgcgcagcggctggc
ccaggcctgcgatgcgctgcgattgccggccgacgccggccagcagc
agaagctgctgcgctatatcgagcaaatgcagcgctggaaccgcacg
tacaacctgactgccatccgggatccggggcagatgctcgtgcagca
cctgttcgacagtctgtcggtcgtggcgccgctggagcgtggcctgc
ccggcgtggtgctggccatcatgcgcgcccattgggacgtcacatgc
gtggacgcagtcgagaagaaaaccgcattcgtgcgacagatggccgg
cgcgctcggactgcccaatctgcaggccgcgcatacccgtatcgaac
agctcgaaccggcgcaatgcgacgtggtgatatcgcgtgcgttcgct
tcgttacaggacttcgcgaagctggccggccgccacgtgcgcgaggg
tggtaccctcgtcgccatgaagggcaaggtgcccgatgacgaaatcc
aggcgttacagcaacacggccactggacggtcgaacggatcgaaccg
ttggtggtgccggcactcgacgcgcaacgctgcctgatatggatgcg
acgcagtcaaggaaacata

The Multinomial density

Suppose that 𝐗Multinomial(N,𝜽) is a p-dimensional sampling distribution then we can express the multinomial density as

f(𝐱𝜽)=N!j=1pθjxjxj!,

where N=j=1pxj and j=1pθj=1. Note that this is from the multi-parameter exponential distribution shown by Equation (4.1) with T(x)=[x1x2xp] and η(θ)=[log(θ1)log(θ2)log(θp)]

The Dirichlet prior

Suppose that 𝜽Dirichlet(𝜶) where 𝜽={θ1,θ2,,θp} and j=1pθj=1 then we can express the Dirichlet distribution as

π(𝜽) =1𝐁(𝜶)j=1pθjαj-1

where αj can be counts and 𝐁(𝜶)=j=1pΓ(αj)Γ(j=1pαj) is the normalising constant.

It is easy to see that these two distributions are conjugate because

π(𝜽𝐱) π(𝜽)f(𝐱𝜽)
j=1pθjαj-1×j=1pθjxj
j=1pθjαj+xj-1
𝜽𝐱 Dirichlet(α1+x1,α2+x2,,αp+xp)
Dirichlet(𝜶+𝐱)

A useful property of the Dirichlet

The collapsibility property of the Dirichlet

The marginal parameters of the Dirichlet are marginally Beta as described below

(θ1,θ2,,θp) Dirichlet(α1,α2,,αp)
θj Beta (αj,ijαi),j=1,2,,p.

Example 1

The number of taxis that pass per minute over a 50 minute interval are recorded as:

Number of taxis 0 1 2 3 4 5
Freq 10 12 11 10 5 2

If the events of taxis passing can be assume iid and from a Poisson distribution. Find the expected rate of a arrival and a 95% CI for that rate of arrival assuming anon-informative prior for λ.

Example 2

  1. 1.

    What is the posterior probability of the proportion of DNA bases shown below given a non-informative prior?

  2. 2.

    What are the marginal distributions ie π(ax),π(cx),π(gx),π(tx)?

atgagcgccgttcccgatatccccggcggccccgcgcagcggctggcccaggcctgcgatgcgctgcgattgccggccgacgccggccagcagc
agaagctgctgcgctatatcgagcaaatgcagcgctggaaccgcacgtacaacctgactgccatccgggatccggggcagatgctcgtgcagca
cctgttcgacagtctgtcggtcgtggcgccgctggagcgtggcctgcccggcgtggtgctggccatcatgcgcgcccattgggacgtcacatgc
gtggacgcagtcgagaagaaaaccgcattcgtgcgacagatggccggcgcgctcggactgcccaatctgcaggccgcgcatacccgtatcgaac
agctcgaaccggcgcaatgcgacgtggtgatatcgcgtgcgttcgcttcgttacaggacttcgcgaagctggccggccgccacgtgcgcgaggg
tggtaccctcgtcgccatgaagggcaaggtgcccgatgacgaaatccaggcgttacagcaacacggccactggacggtcgaacggatcgaaccg
ttggtggtgccggcactcgacgcgcaacgctgcctgatatggatgcgacgcagtcaaggaaacata
a c g t
74 141 140 70

Sequential updates for 𝜶, the Dirichlet parameters.

Instead of 𝜽 and N being fixed, we assume that both change over time:

𝐗tMultinomial(Nt,𝜽t). The sequential updates of the hyper-parameters of 𝜽t are very straightforward. Let 𝜶t be the sufficient statistics 𝜶={α1,α2,,αp} on day t and 𝐱t be the crime statistics on day t.

𝜶t𝜶t-1+𝐱t

where t=1,2,N

#R code for sequential updates of alpha.

alpha=rep(1,P); mn=matrix(0,N,P); uq=matrix(0,N,P)
lq=matrix(0,N,P);uq=matrix(0,N,P)
while (t<N)
{
  i=i+1
  alpha=alpha+as.numeric(y[i,])
  mn[t,]=qbeta(rep(.5,P),p1,sum(alpha)-alpha)
  uq[t,]=qbeta(rep(.995,P),p1,sum(alpha)-alpha)
  lq[t,]=qbeta(rep(.005,P),p1,sum(alpha)-alpha)
}

Multinomial data with N varying

Table 4.1: An example of multinomial data showing counts from several categories
C_DRUGS C_SHOTS BURGLARY DISCONDUCT LARCENY MVTHEFT
8 2 10 8 33 19
7 0 10 10 33 14
1 1 13 7 29 31
11 1 10 3 27 16
6 1 7 14 22 26
6 3 11 14 28 22

Multinomial data with N varying

Figure 4.1: Link, Caption: This is from a US crime dataset and shows numbers of crimes of various types recorded by a particular patrol crew. Although the data is multinomial the proportion parameter shows drift. Note Ni, the number of crimes, is not fixed over time and we condition on it.

Multinomial data with no forgetting

Figure 4.2: Link, Caption: This is from the same US crime dataset and shows sequential estimates of the Dirichlet proportion parameters, 𝜶. The predictions (the line) assume there is no forgetting. The sample proportions for each time are shown as plus signs.

Multinomial data with forgetting

Figure 4.3: Link, Caption: This is from the same US crime dataset and shows sequential estimates of the Dirichlet proportion parameters. This time there is forgetting of the parameters at each time step.

The types of crime

Attribute Definition
1 AGGS Agrrevated assault
2 ARSON Arson
3 BURGLARY Burglary
4 CMIS Criminal mischief
5 DRUGS Drug offense
6 GAMBLING Gambling
7 LARCENY Larceny
8 MENACING Menacing
9 MVTHEFT Motor vehicle theft
10 MURDER_MANSLAUGHTER Murder/manslaughter
11 DISCONDUCT Discoduct
12 RAPE Rape
13 ROBBERY Robbery
14 SIMPASS Simple Assault
15 TRESPASS Trespassing
16 WEAPONS Weapons charges
17 VCI ARSON, CMIS, DRUGS, DISCONDUCT, SIMPASS,WEAPONS
18 P1P AGGASS, MURDER, RAPE, ROBBERY
19 P1V BURGLARY, LARCENY, MVTHEFT, ROBBERY