Home page for accesible maths

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

10.2 Link to one-way ANOVA

Recall from Chapter 5 that a one-way ANOVA is a method for comparing the group means of three or more groups; an extension of the unpaired t-test.

It turns out that the one-way ANOVA is a special case of a simple linear model, in which the explanatory variable is a factor with three or more levels, where each level represents membership of one of the groups.

Suppose that the factor has m-levels, then the linear model for a one-way ANOVA can be written as

𝔼[Yi]=β1xi,1+β2xi,2++βmxi,m

where xi,j is the indicator variable for the j-th level of the factor.

The purpose of an ANOVA is to test whether the mean response varies between different levels of the factor. This is equivalent to testing

H0:β1=β2==βm

vs.

H1:β1β2βm.

In turn, this is equivalent to a model selection between

  • 1

    H0: Model 1, where 𝔼[Yi]=β1; and

  • 2

    H1: Model 2, where 𝔼[Yi]=β1xi,1+β2xi,2++βmxi,m.

Now, for model 1 states that all responses share a common population mean, our design matrix is simply a column of 1’s and β^1=y¯, the overall sample mean. For model 2, the design matrix has m columns, with

Xi,j={1if individual i is in group j0otherwise

Therefore XX is an m×m diagonal matrix, the diagonal entries of which correspond to the number of individuals in each of the groups,

(XX)j,j=nj,

j=1,,m, and Xy is a vector of length m, with j-th entry being the sum of all the responses in group j. It follows that

β^j =[(XX)-1Xy]j
=1nji=1nyiI[individual i is in group j]
=y¯j

i.e. the least squares estimate of the j-th regression coefficient is the observed mean of that group.

Calculating the sums of squares for the two models, we have

SS1=i=1n(yi-y¯)2

which, in ANOVA terminology, is what we referred to has the ‘total sum of squares’, and

SS2=i=1n(yi-y¯1xi,1--y¯mxi,m)2

which, in ANOVA terminology, is what we referred to as the within groups sum of squares.

Consequently, the F-ratio for model selection can be shown to be identical to the test statistic used for the one-way ANOVA:

F =(SS1-SS2)/(m-1)SS2/(n-m)
=(SST-SSW)/(m-1)SSW/(n-m)
=SSB/(m-1)SSW/(n-m)
=MSBMSw.