Home page for accesible maths 2.4 Fitting Distributions to Data 2.4 Fitting Distributions to Data 2.5 Assessing the fit of a distribution

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

2.4.1 Method of Moments

In this course we will use the method of moments to estimate the parameters of any distributions that we fit. In Math235 you will learn about an alternative method called Maximum Likelihood Estimation.

What is a moment?

In essence, a moment measures shape. For random variables this characterizes the shape of the probability distribution. For example, the first moment is the mean of the distribution. In general, the $k^{th}$ moment for a random variable, $X$ , is defined as:

\mu_{k}=\mathbb{E}\left(X^{k}\right).

Thus the first four moments are:

	$\displaystyle\mu_{1}$	$\displaystyle=\mathbb{E}\left(X\right);$
	$\displaystyle\mu_{2}$	$\displaystyle=\mathbb{E}\left(X^{2}\right);$
	$\displaystyle\mu_{3}$	$\displaystyle=\mathbb{E}\left(X^{3}\right);$
	$\displaystyle\mu_{4}$	$\displaystyle=\mathbb{E}\left(X^{4}\right).$

Sample moments

Just as the sample mean is an estimate of the population mean, the sample moments are estimates of the population moments. Thus the $k^{th}$ sample moment, for data $\{x_{i}\}_{i=1}^{n}$ is defined as:

\frac{1}{n}\sum_{i=1}^{n}{x_{i}^{k}}.

Method of moments estimators

The method of moments estimator $\hat{\theta}$ of a set of parameters $\theta$ is defined as the solution to the set of equations:

\mathbb{E}\left(X^{k};\hat{\theta}\right)=\frac{1}{n}\sum_{i=1}^{n}{x_{i}^{k}},

i.e. the population moment equated to the sample moment.

Theoretically you could use any $k^{th}$ moment to estimate $\hat{\theta}$ but typically we use the first $p$ moments where $\theta$ is $p-$ dimensional. This amounts to solving a system of $p$ equations.

Example 2.4.1

Let $X_{1},X_{2},\ldots,X_{n}$ be a random sample from the probability density given by

f(x)=\theta\sin^{2}{x}\qquad 0\leq x\leq\pi

Find the method of moments estimator for $\theta$ .

Answer. The distribution has one parameter so we use the first moment equation,

\mathbb{E}\left(X;\hat{\theta}\right)=\frac{1}{n}\sum_{i=1}^{n}x_{i}.

In order to solve this we first need to calculate the expectation,

	$\displaystyle\mathbb{E}\left(X;\theta\right)$	$\displaystyle=\int_{0}^{\pi}x\theta\sin^{2}(x)\mathrm{d}x$
		$\displaystyle=\theta\left[\frac{\sin(x^{2})}{4}-\frac{x\sin{2x}}{4}+\frac{x^{2% }}{4}\right]_{0}^{\pi}$
		$\displaystyle=\frac{\theta\pi^{2}}{4}.$

Substituting this into the first moment equation gives,

	$\displaystyle\mathbb{E}\left(X;\hat{\theta}\right)$	$\displaystyle=\frac{1}{n}\sum_{i=1}^{n}x_{i}$
	$\displaystyle\implies\qquad\frac{\hat{\theta}\pi^{2}}{4}$	$\displaystyle=\frac{1}{n}\sum_{i=1}^{n}x_{i}$
	$\displaystyle\implies\qquad\hat{\theta}$	$\displaystyle=\frac{4\sum_{i=1}^{n}x_{i}}{n\pi^{2}}.$

Derivations for common distributions

The following paragraphs detail the derivations of the method of moments estimates for common distributions.

Bernoulli

The Bernoulli distribution has one parameter, $\theta$ and mean $\theta$ . The method of moments estimate is thus:

	$\displaystyle\mathbb{E}\left(X;\hat{\theta}\right)$	$\displaystyle=\frac{1}{n}\sum_{i=1}^{n}{x_{i}}$
	$\displaystyle\implies\hat{\theta}$	$\displaystyle=\frac{1}{n}\sum_{i=1}^{n}{x_{i}},$

i.e. the estimate of the probability of success is the average number of successes observed in the data. This is probably the estimate you would have thought to use anyway!

Example 2.4.2

Penalty shoot-outs were introduced into World Cup finals in 1982. Since then 26 matches have been decided by penalties, see Table 2.2 for details. In total 240 penalties have been taken resulting in 170 goals. Calculate the method of moments estimate for the probability of a goal from a single shot, be careful to state your assumptions.

Answer. As the outcome of a penalty is goal or no goal, this is a binary response. The Bernoulli distribution seems appropriate to model this but we are making the following assumptions by using it:

1.

the probability of a goal is the same for each shot taken;
2.

whether or not a goal is scored at each penalty is independent of all the other penalties taken.

Answer. Assuming a Bernoulli distribution the method of moments estimate is number of successes/total number of tries, i.e. $\hat{\theta}=170/240=0.708$ .

Example 2.4.3

Are the assumptions you made in example 2.4.2 appropriate?

Answer. It is unlikely that the assumptions made are appropriate. This is because of several reasons, including:

•

one person could have taken multiple penalties (across years)
•

one person could be better or worse at taking penalties than another
•

the current ‘‘score’’ could impact on the probability of scoring (mental pressure).

Year	Teams	Scores	Aggregate
1982	West Germany v France	5/5-4/5	9/10
1986	France v Brazil	4/5-3/4	7/9
	West Germany v Mexico	4/4-1/3	5/7
	Belgium v Spain	5/5-4/5	9/10
1990	Rep. of Ireland v Romania	5/5-4/5	9/10
	Argentina v Yugoslavia	3/5-2/5	5/10
	Argentina v Italy	4/5-3/5	7/10
	West Germany v England	4/5-3/5	7/10
1994	Bulgaria v Mexico	3/4-1/4	4/8
	Sweden v Romania	5/6-4/6	9/12
	Brazil v Italy	3/5-2/5	5/10
1998	Argentina v England	4/5-3/5	7/10
	France v Italy	4/5-3/5	7/10
	Brazil v Netherlands	4/4-2/4	6/8
2002	Spain v Rep. of Ireland	3/5-2/5	5/10
	South Korea v Spain	5/5-3/4	8/9
2006	Ukraine v Switzerland	3/4-0/3	3/7
	Germany v Argentina	4/4-2/3	6/7
	Portugal v England	3/5-1/4	4/9
	Italy v France	5/5-3/5	8/10
2010	Paraguay v Japan	5/5-3/4	8/9
	Uruguay v Ghana	4/5-2/4	6/9
2014	Brazil v Chile	3/5-2/5	5/10
	Costa Rica v Greece	5/5-3/4	8/9
	Netherlands v Costa Rica	4/4-3/5	7/9
	Netherlands v Argentina	2/4-4/4	6/8

Table 2.2: Summary of penalty shoot outs in the World Cup.

Geometric

The Geometric distribution has one parameter, $\theta$ and mean $\tfrac{1-\theta}{\theta}$ . The method of moments estimate is thus:

	$\displaystyle\mathbb{E}\left(X;\hat{\theta}\right)$	$\displaystyle=\frac{1}{n}\sum_{i=1}^{n}{x_{i}}$
	$\displaystyle\implies\frac{1-\hat{\theta}}{\hat{\theta}}$	$\displaystyle=\frac{1}{n}\sum_{i=1}^{n}{x_{i}}$
	$\displaystyle\implies\hat{\theta}$	$\displaystyle=\frac{n}{n+\sum_{i=1}^{n}{x_{i}}}.$

As $n$ is the total number of successes and $n+\sum_{i=1}^{n}{x_{i}}$ is the total number of attempts, this too is probably the estimate you would have thought to use.

Example 2.4.4

The UK open data initiative publishes statistics on pass rates for driving tests: ²⁰²⁰https://www.gov.uk/government/collections/driving-tests-and-instructors-statistics . A bar plot of the data is in Figure LABEL:drivingtestpass. What distribution would you use to model this data? Calculate the method of moments estimate of its parameter(s).

R> data(driving1213)
R> tot.pass=colSums(driving1213[,c(3,6,9,12,15,18)])
R> barplot(tot.pass, names.arg=c(’1st’,"2nd","3rd","4th","5th","6th+"))

Answer. The data is for number of times until a person passes their driving test. Once passed you do not take the test again (unless your license is removed). This sounds like a Geometric distribution where trials continue until a positive outcome is seen. The method of moments estimate for a Geometric distribution is total number of successes divided by total number of attempts, thus for the driving data, this is approximately $\hat{\theta}=695543/1470463=0.473$ as we don’t know how many tests those taking 6+ actually took.

sum(tot.pass)/sum(tot.pass*1:6)

Poisson

The Poisson distribution has one parameter, $\theta$ and mean $\theta$ . The method of moments estimate is thus:

	$\displaystyle\mathbb{E}\left(X;\hat{\theta}\right)$	$\displaystyle=\frac{1}{n}\sum_{i=1}^{n}{x_{i}}$
	$\displaystyle\implies\hat{\theta}$	$\displaystyle=\frac{1}{n}\sum_{i=1}^{n}{x_{i}},$

the same as in the Bernoulli case.

Uniform

The Uniform distribution has two parameters $\theta=(a,b)$ with mean $\tfrac{(a+b)}{2}$ and variance $\tfrac{(a-b)^{2}}{12}$ . The method of moments estimate is thus the solution of the simultaneous equations:

	$\displaystyle\mathbb{E}\left(X;\hat{\theta}\right)$	$\displaystyle=\frac{1}{n}\sum_{i=1}^{n}{x_{i}}$		(2.1)
	$\displaystyle\mathbb{E}\left(X^{2};\hat{\theta}\right)$	$\displaystyle=\frac{1}{n}\sum_{i=1}^{n}{x_{i}^{2}}.$		(2.2)

Recall that $\mbox{Var}(X)=\mathbb{E}\left(X^{2}\right)-\mathbb{E}\left(X\right)^{2}$ so that Equation (2.2) can be written as

$\displaystyle\mathbb{E}\left(X^{2};\hat{\theta}\right)$	$\displaystyle=\frac{1}{n}\sum_{i=1}^{n}{x_{i}^{2}}$
$\displaystyle\implies\mbox{Var}(X)+\mathbb{E}\left(X\right)^{2}$	$\displaystyle=\frac{1}{n}\sum_{i=1}^{n}{x_{i}^{2}}$
$\displaystyle\implies\frac{(\hat{a}-\hat{b})^{2}}{12}+\frac{(\hat{a}+\hat{b})^% {2}}{4}$	$\displaystyle=\frac{1}{n}\sum_{i=1}^{n}{x_{i}^{2}}.$	(2.3)

Substituting in the definition of the population mean and rearranging Equation (2.1) we get,

\hat{a}=\frac{2}{n}\sum_{i=1}^{n}{x_{i}}-\hat{b}.

Substituting this into Equation (2.3) and rearranging we get,

\hat{b}=\frac{1}{n}\sum_{i=1}^{n}{x_{i}}+\sqrt{\frac{3}{n}\left[\sum_{i=1}^{n}% {x_{i}^{2}}-\frac{1}{n}\left(\sum_{i=1}^{n}{x_{i}}\right)^{2}\right]}=\bar{x}+% \sqrt{\frac{3(n-1)}{n}}s(x).

Substituting our estimate of $\hat{b}$ back into our estimate for $\hat{a}$ gives,

\hat{a}=\frac{1}{n}\sum_{i=1}^{n}{x_{i}}-\sqrt{\frac{3}{n}\left[\sum_{i=1}^{n}% {x_{i}^{2}}-\frac{1}{n}\left(\sum_{i=1}^{n}{x_{i}}\right)^{2}\right]}=\bar{x}+% \sqrt{\frac{3(n-1)}{n}}s(x).

Normal

The Normal distribution has two parameters $\theta=(\mu,\sigma)$ with mean $\mu$ and variance $\sigma^{2}$ . The method of moments estimate is thus the solution of the simultaneous equations:

	$\displaystyle\mathbb{E}\left(X;\hat{\theta}\right)$	$\displaystyle=\frac{1}{n}\sum_{i=1}^{n}{x_{i}}$		(2.4)
	$\displaystyle\mathbb{E}\left(X^{2};\hat{\theta}\right)$	$\displaystyle=\frac{1}{n}\sum_{i=1}^{n}{x_{i}^{2}}.$		(2.5)

Taking Equation (2.4) we have,

	$\displaystyle\mathbb{E}\left(X;\hat{\theta}\right)$	$\displaystyle=\frac{1}{n}\sum_{i=1}^{n}{x_{i}}$
	$\displaystyle\implies\hat{\mu}$	$\displaystyle=\frac{1}{n}\sum_{i=1}^{n}{x_{i}}.$		(2.6)

Recall that $\mbox{Var}(X)=\mathbb{E}\left(X^{2}\right)-\mathbb{E}\left(X\right)^{2}$ so that Equation (2.5) can be written as

	$\displaystyle\mathbb{E}\left(X^{2};\hat{\theta}\right)$	$\displaystyle=\frac{1}{n}\sum_{i=1}^{n}{x_{i}^{2}}$
	$\displaystyle\implies\mbox{Var}(X)+\mathbb{E}\left(X\right)^{2}$	$\displaystyle=\frac{1}{n}\sum_{i=1}^{n}{x_{i}^{2}}$
	$\displaystyle\implies\hat{\sigma}^{2}+\hat{\mu}^{2}$	$\displaystyle=\frac{1}{n}\sum_{i=1}^{n}{x_{i}^{2}}.$

Now substituting in $\hat{\mu}$ from Equation (2.6) gives

	$\displaystyle\hat{\sigma}^{2}+\hat{\mu}^{2}$	$\displaystyle=\frac{1}{n}\sum_{i=1}^{n}{x_{i}^{2}}$
	$\displaystyle\implies\hat{\sigma}^{2}$	$\displaystyle=\frac{1}{n}\sum_{i=1}^{n}{x_{i}^{2}}-\left(\frac{1}{n}\sum_{i=1}% ^{n}{x_{i}}\right)^{2}$
	$\displaystyle\implies\hat{\sigma}^{2}$	$\displaystyle=\frac{(n-1)}{n}s^{2}(x).$

Whilst most of the formulae above are logical estimates of the parameters in question, it is crucial that you be able to derive and justify your use of them. Many of you may blindly believe that the sky is blue but how many of you can convince someone who insists that the sky is green using scientific and logical arguments? It is at this time in your mathematical career that you need to be making sure that you do not blindly believe and ask why. Sometimes you may not yet have the tools to understand the answer and thus may be told ‘‘you will find out in MathXXX’’ but you should be asking the questions.

Example 2.4.5

Let $X_{1},X_{2},\ldots,X_{n}$ be a random sample from the probability mass function given by

p_{X}(x)=\begin{cases}\frac{1}{m+1},&\mbox{for}\quad x=0,1,2,\ldots,m\\ 0,&\mbox{otherwise}.\end{cases}

Find the method of moments estimator for $m$ .

Answer. There is one parameter so we use the first moment equation:

\mathbb{E}\left(X;\hat{m}\right)=\frac{1}{n}\sum_{i=1}^{n}{x_{i}}

Firstly we need to calculate the expectation of the pmf:

	$\displaystyle\mathbb{E}\left(X;m\right)$	$\displaystyle=\sum_{x=0}^{\infty}xp_{X}(x)$
		$\displaystyle=\sum_{x=0}^{m}x\frac{1}{m+1}$
		$\displaystyle=\frac{1}{m+1}\sum_{x=0}^{m}x$
		$\displaystyle=\frac{1}{m+1}\frac{1}{2}m(m+1)=\frac{m}{2}.$

Substituting this into the first moment equation we get:

	$\displaystyle\frac{\hat{m}}{2}$	$\displaystyle=\frac{1}{n}\sum_{i=1}^{n}{x_{i}}$
	$\displaystyle\implies\qquad\hat{m}$	$\displaystyle=\frac{2}{n}\sum_{i=1}^{n}{x_{i}}.$

TIP: General Approach to MoM questions The general approach to answer these questions is: 1. Decide on which distribution to use and what the parameters are. 2. Write down the moment equations (remember: 1 equation per parameter you have to estimate) 3. Calculate the moments using the distribution from (1). 4. Plug-in the moments calculated in (3) into the moment equations from (2) 5. Solve the moment equations for the parameters (might have to use simultaneous equations for multi-parameter problems). This is the general estimator for the distribution of interest. 6. Plug-in specific numbers for the scenario, if given in the question. This is the specific parameter estimate for the data in question. Now that we can fit parametric models to data, the following section discusses how to assess whether a specific parametric model may be an appropriate approximation.