MATH235

MATH235 Week 3 - Workshop problems

If not all of the problems below are discussed in the workshop for lack of time, then please have a go at the problems on your own.

WS3.1 

Consider an unpaired t-test comparing H0:μX=μY against H1:μXμY, in which the pooled variance sp2 and the sample size n are both known and fixed. Suppose that the true difference between the means is Δ. We will investigate how the power of the test varies with both Δ and the significance level α.

  1. (a)

    Explain what is meant by the power of a hypothesis test. Which type of error does this relate to, and how?

  2. (b)

    Assuming a significance level of 5%, Figure 0.1 (First Link, Second Link) shows a plot of the power of the test, as a function of the true difference Δ. What do you notice about the power as Δ increases? Can you explain why this result makes sense?

  3. (c)

    For a fixed difference Δ, Figure 0.1 (First Link, Second Link) shows the power against the significance level of the test. What happens to the power as the significance level increases? Can you explain why this is? Hint you might want to use the relationship that you described in part (a) to help you answer this part of the question.

Figure 0.1: First Link, Second Link, Caption: Power vs. Δ, the true difference between μX and μY (left) and power vs. significance level α (right).

WS3.2 

To assess the effect of different kinds of protein on weight gain, a group of rats were each randomly assigned to one of four groups. Each group was then assigned a different diet (low protein beef, high protein beef, low protein cereal and high protein cereal). Table 0.9 contains the weight gained by each rat in each group. The data are a subset of a data set found in A Handbook of Small Data Sets http://www.stat.ncsu.edu/research/sas/sicl/data/.

Beef Low Beef High Cereal Low Cereal High
90 73 107 98
76 102 95 74
90 118 97 56
64 104 80 111
86 81 98 95
51 107 74 88
72 100 74 82
90 87 67 77
Table 0.9: Weight gain in rats fed four different types of diet.

Let μLB, μHB, μLC and μHC denote the mean weight gains for the low beef, high beef, low cereal and high cereal groups respectively.

  1. (a)

    Write down appropriate null and alternative hypotheses to test whether or not the group means are the same.

  2. (b)

    Given that the overall mean is 86.375 and the group means are 77.375, 96.5, 86.5 and 85.125, carry out a one-way ANOVA to test you hypotheses in part (a). Test at the 5% significance level. What are your conclusions?

  3. (c)

    What assumptions are made by the one-way ANOVA?

WS3.3 

Table 0.12 contains the winning Olympic high jump heights, long jump distances and discus throws for a sample of 10 Olympic years (the Olympics are held every four years). All measurements are in inches. We are interested in whether or not we can use winning long jump distances and discus throws to predict the winning high jump height.

Observation Year High Jump Long Jump Discus Throw
1 1896 71.25 249.750 1147.50
2 1900 74.80 282.875 1418.90
3 1904 71.00 289.000 1546.50
4 1952 80.32 298.000 2166.85
5 1960 85.00 319.750 2330.00
6 1964 85.75 317.750 2401.50
7 1972 87.75 324.500 2535.00
8 1976 88.50 328.500 2657.40
9 1980 92.75 336.250 2624.00
10 1984 92.50 336.250 2622.00
Table 0.10: Winning Olympic high jump heights and long jump and discus distances for a sample of 10 years. All measurements are in inches.
  1. (a)

    What is the response variable in this problem? What are the explanatory variables?

  2. (b)

    The following linear regression model is fitted to the data,

    Yi=β1+β2xi,1+β3xi,2+ϵi,i=1,,n

    where Yi is high jump height, xi,1 is long jump distance. xi,2 is discus throw and ϵ1,,ϵn are residuals.

    1. (i)

      What is the interpretation of β1?

    2. (ii)

      What is the interpretation of β2?

    3. (iii)

      What is the interpretation of β3?

  3. (c)

    The least squares estimate for β is β^=(39.4,0.0665,0.0108). Use this to estimate the expected winning high jump height in 1972.

WS3.4 

  1. (a)

    For the linear regression model,

    • (i)

      Define the least squares function S(β).

    • (ii)

      Write down the general form for the least squares estimates regression coefficients β.

    • (iii)

      What condition is satisfied by S(β^)?

  2. (b)

    Table 0.11 shows the sample of 10 mean June temperatures in Durham, previously seen in Part 1. The question of interest is whether this sample provides evidence of an increase in the mean June temperature over time. We can assess this by fitting the linear regression model

    𝔼[Yi]=β1+β2(ti-1879),

    where Yi is temperature and ti is the year of observation. We centre times at 1879 as recording began in 1880 (although this year is not in our sample).

    Year 1885 1911 1936 1950 1951 1961 1963 1990 1994 2004
    Mean Temp. (C) 12.70 12.95 12.85 15.20 12.10 13.50 13.55 12.70 13.35 14.40
    Table 0.11: Mean June temperatures in Durham, sample of size 10
    • (i)

      What is the design matrix for this model?

    • (ii)

      Find the least squares estimates β^ for β=(β1,β2).

    • (iii)

      Interpret your estimates.