MATH235

MATH235 Week 4 - Workshop problems

If not all of the problems below are discussed in the workshop for lack of time, then please have a go at the problems on your own.

WS4.1 

The linear regression model is defined as

Yi=β1xi,1++βpxi,p+ϵi,i=1,,n.
  1. (a)
    • (i)

      Define the predicted value for individual i.

    • (ii)

      Define the estimated residual for individual i.

  2. (b)

    Give the formula for the estimated residual variance σ^2. You should define any notation that you use.

  3. (c)

    Consider the following model for the Olympics data in Question Sheet 3, which uses long jump distances (xi,1) and discus throws (xi,2) as covariates for high jump height (Yi),

    𝔼[Yi]=β1+β2(xi,1-x¯1)+β3(xi,2-x¯2)

    where x¯1=308.3 and x¯2=2145.0 are the sample means of the long jumps and discus throws respectively.

    • (i)

      What is the coefficient β1 known as?

    • (ii)

      Write down the design matrix for this model.

    For this model the estimated regression coefficients are β^=(82.96,0.0664,0.0108).

    • (iii)

      Obtain the vector of estimated residuals, and hence give an estimate of the residual variance σ2.

    • (iv)

      What is the predicted value for the high jump in 1948, when the winning long jump was 308 inches and the winning discus throw was 2078 inches?

WS4.2 

This question focuses on the sampling distribution of the least squares estimator β^.

  1. (a)

    Derive the expected value of the least squares estimator β^. Is this estimator unbiased?

  2. (b)

    What is the sampling distribution for β^? You should state any parameters in this distribution, as well as its name.

    Now consider the following model for the log FEV data introduced in Question Sheet 2,

    𝔼[logYi]=β1+β2xi

    where Yi is FEV and xi is age.

  3. (c)

    Using the data in Table 2 of Question Sheet 2,

    • (i)

      Find the least squares estimates β^=(β^1,β^2).

    • (ii)

      Given that σ^2=0.0532, find the variance matrix for β^.

    • (iii)

      Using your answers to parts (i) and (ii) or otherwise, test at the 5% level whether or not there is evidence of a linear relationship between age and log FEV. You should state clearly your hypotheses and conclusions.