Home page for accesible maths 3.3 Two sample tests 3.3 Two sample tests 3.3.2 Paired data

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

3.3.1 Unpaired data

Suppose that $x_{1},\ldots,x_{n}$ are realisations of IID random variables with $\operatorname{Normal}(\mu_{X},\sigma^{2})$ distribution and $y_{1},\ldots,y_{m}$ are realisations of IID random variables with $\operatorname{Normal}(\mu_{Y},\sigma^{2})$ distribution. Assume also that $(X_{i},Y_{i})$ are independent for $i=1,\ldots,n$ . We will assume that the population variance is the same for both populations, but that it is unknown.

The algorithm for testing

\displaystyle H_{0}:\mu_{X}-\mu_{Y}=d

vs.

\displaystyle H_{1}:\mu_{X}-\mu_{Y}\neq d

is as follows.

1.

Calculate the sample means $\bar{x}$ and $\bar{y}$ , and the sample variances $s^{2}_{x}$ and $s^{2}_{y}$ .
2.

Find the pooled sample variance,

$\displaystyle s^{2}_{p}=\frac{(n-1)s^{2}_{x}+(m-1)s^{2}_{y}}{n+m-2}.$
3.

Calculate the test statistic

$\displaystyle t=\frac{(\bar{x}-\bar{y})-d}{s_{p}\sqrt{1/n+1/m}}.$
4.

Compare $t$ to the $t_{n+m-2}$ -distribution, using either a $p$ -value or critical region approach.

TheoremExample 3.3.1 Loaves of bread

A baker wants to test whether his two apprentices, Jack and Jill, are baking loaves of bread of a consistent size. He takes a sample of ten loaves from Jack, and eight from Jill. The weights of the loaves (in grams) from the two samples are:

Jack: 502, 502, 495, 506, 492, 505, 504, 502, 490, 512

Jill: 495, 500, 495, 501, 494, 501, 505, 493

The data can also be downloaded in the file loaves. Is there evidence that the mean weight of a loaf baked by Jill is lower than the mean weight of a loaf baked by Jack?

Test, at the 5% level, the hypothesis

\displaystyle H_{0}:\mu_{\text{Jill}}=\mu_{\text{Jack}}

vs.

\displaystyle H_{1}:\mu_{\text{Jill}}<\mu_{\text{Jack}}.

Let $x_{1},\ldots,x_{10}$ denote the weights of the loaves baked by Jack and let $y_{1},\ldots,y_{8}$ denote the weights of the loaves baked by Jill.

1.
First calculate the two sample means. Using R (or by hand),
1. $\bar{x}=501$ ,
2. $s^{2}_{X}=45.8$ .
And
1. $\bar{y}=498$ ,
2. $s^{2}_{Y}=18.6$ .
2.

Since $n=10$ and $m=8$ , the pooled variance is

$\displaystyle s^{2}_{p}=\frac{9\times 45.8+7\times 18.6}{10+8-2}=\frac{542.4}{% 16}=33.9.$
3.

Since here $d=0$ , the test statistic is

$\displaystyle t=\frac{498-501}{\sqrt{33.9}\sqrt{1/10+1/8}}=\frac{-3}{5.82% \times 0.474}=-1.09.$
4.

To calculate the $p$ -value in R, we compare to the $t_{16}$ -distribution. Since $t<0$ ,

⬇

> pt(-1.09,df=16)

[1] 0.146

That is a $p$ -value of $0.146>0.05$ . We conclude that there is no evidence to reject $H_{0}$ at the 5% level, i.e. there is no evidence that Jill is baking smaller loaves than Jack.

Remark.

To carry out the previous test using a critical value approach, we would look in tables for the value of $t_{16}$ at the 10% level (since we are doing a one-tailed test at the 5% level). This is 1.75. Since $|-1.09|=1.09<1.75$ , we would not reject $H_{0}$ at the 5% level.