Home page for accesible maths 3.3 Two sample tests

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

3.3.1 Unpaired data

Suppose that x1,,xn are realisations of IID random variables with Normal(μX,σ2) distribution and y1,,ym are realisations of IID random variables with Normal(μY,σ2) distribution. Assume also that (Xi,Yi) are independent for i=1,,n. We will assume that the population variance is the same for both populations, but that it is unknown.

The algorithm for testing

H0:μX-μY=d

vs.

H1:μX-μYd

is as follows.

  1. 1.

    Calculate the sample means x¯ and y¯, and the sample variances sx2 and sy2.

  2. 2.

    Find the pooled sample variance,

    sp2=(n-1)sx2+(m-1)sy2n+m-2.
  3. 3.

    Calculate the test statistic

    t=(x¯-y¯)-dsp1/n+1/m.
  4. 4.

    Compare t to the tn+m-2-distribution, using either a p-value or critical region approach.

TheoremExample 3.3.1 Loaves of bread

A baker wants to test whether his two apprentices, Jack and Jill, are baking loaves of bread of a consistent size. He takes a sample of ten loaves from Jack, and eight from Jill. The weights of the loaves (in grams) from the two samples are:

Jack: 502, 502, 495, 506, 492, 505, 504, 502, 490, 512

Jill: 495, 500, 495, 501, 494, 501, 505, 493

The data can also be downloaded in the file loaves. Is there evidence that the mean weight of a loaf baked by Jill is lower than the mean weight of a loaf baked by Jack?

Test, at the 5% level, the hypothesis

H0:μJill=μJack

vs.

H1:μJill<μJack.

Let x1,,x10 denote the weights of the loaves baked by Jack and let y1,,y8 denote the weights of the loaves baked by Jill.

  1. 1.

    First calculate the two sample means. Using R (or by hand),

    1. x¯=501,

    2. sX2=45.8.

    And

    1. y¯=498,

    2. sY2=18.6.

  2. 2.

    Since n=10 and m=8, the pooled variance is

    sp2=9×45.8+7×18.610+8-2=542.416=33.9.
  3. 3.

    Since here d=0, the test statistic is

    t=498-50133.91/10+1/8=-35.82×0.474=-1.09.
  4. 4.

    To calculate the p-value in R, we compare to the t16-distribution. Since t<0,

    > pt(-1.09,df=16)
    [1] 0.146

    That is a p-value of 0.146>0.05. We conclude that there is no evidence to reject H0 at the 5% level, i.e. there is no evidence that Jill is baking smaller loaves than Jack.

Remark.

To carry out the previous test using a critical value approach, we would look in tables for the value of t16 at the 10% level (since we are doing a one-tailed test at the 5% level). This is 1.75. Since |-1.09|=1.09<1.75, we would not reject H0 at the 5% level.