Home page for accesible maths 3.2 Difference of two means 3.2 Difference of two means 3.2.2 Confidence interval for the difference

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

3.2.1 Point estimates and standard errors for differences of means

We would like to estimate the average difference in run times for men and women using the LonMar13Samp data set, which was a simple random sample of 65 men and 35 women from all runners in the 2013 London Marathon. Table 3.3 presents relevant summary statistics, and box plots of each sample are shown in Figure LABEL:cbrRunTimesMenWomen.

R> boxplot(LonMar13Samp[,3] $\sim$ LonMar13Samp[,4])

	men	women
$\bar{x}$	266.9038	285.7438
$s$	44.48394	57.26673
$n$	65	35

Table 3.3: Summary statistics for the run time of 100 participants in the 2013 London Marathon.

The two samples are independent of one-another, so the data are not paired. Instead a point estimate of the difference in average 26 mile times for men and women, $\mu_{w}-\mu_{m}$ , can be found using the two sample means:

\displaystyle\bar{x}_{w}-\bar{x}_{m}\ =\ 285.7438-266.9038\ =\ 18.83996

Because we are examining two simple random samples from less than 10% of the population, each sample contains at least 30 observations, and neither distribution is strongly skewed, we can safely conclude the sampling distribution of each sample mean is nearly normal. Finally, because each sample is independent of the other (e.g. the data are not paired), we can conclude that the difference in sample means can be modelled using a normal distribution.³¹³¹Probability theory guarantees that the difference of two independent normal random variables is also normal. Because each sample mean is nearly normal and observations in the samples are independent, we are assured the difference is also nearly normal.

Conditions for normality of $\bar{x}_{1}-\bar{x}_{2}$ If the sample means, $\bar{x}_{1}$ and $\bar{x}_{2}$ , each meet the criteria for having nearly normal sampling distributions and the observations in the two samples are independent, then the difference in sample means, $\bar{x}_{1}-\bar{x}_{2}$ , will have a sampling distribution that is nearly normal.

We can quantify the variability in the point estimate, $\bar{x}_{w}-\bar{x}_{m}$ , using the following formula for its standard error:

\displaystyle SE_{\bar{x}_{w}-\bar{x}_{m}}=\sqrt{\frac{\sigma_{w}^{2}}{n_{w}}+% \frac{\sigma_{m}^{2}}{n_{m}}}

We usually estimate this standard error using standard deviation estimates based on the samples:

	$\displaystyle SE_{\bar{x}_{w}-\bar{x}_{m}}$	$\displaystyle=\sqrt{\frac{\sigma_{w}^{2}}{n_{w}}+\frac{\sigma_{m}^{2}}{n_{m}}}$
		$\displaystyle\approx\sqrt{\frac{s_{w}^{2}}{n_{w}}+\frac{s_{m}^{2}}{n_{m}}}=% \sqrt{\frac{57.26673^{2}}{35}+\frac{44.48394^{2}}{65}}=11.14194.$

Because each sample has at least 30 observations ( $n_{w}=35$ and $n_{m}=65$ ), this substitution using the sample standard deviation tends to be very good.

Distribution of a difference of sample means The sample difference of two means, $\bar{x}_{1}-\bar{x}_{2}$ , is nearly normal with mean $\mu_{1}-\mu_{2}$ and estimated standard error $\displaystyle\textstyle SE_{\bar{x}_{1}-\bar{x}_{2}}=\sqrt{\frac{s_{1}^{2}}{n_% {1}}+\frac{s_{2}^{2}}{n_{2}}}$ (3.1) when each sample mean is nearly normal and all observations are independent.