Home page for accesible maths 5 Analysis of Variance 5 Analysis of Variance 5.2 One-way ANOVA

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

5.1 Multiple $t$ -tests

Carrying out multiple two sample $t$ -tests seems the obvious way to compare means across a number of groups. However there are two reasons why this may not be such a sensible idea:

1.

There are a lot of tests;
2.

Individual test errors multiply across tests, see discussion below.

Large number of tests. How many tests are required to carry out all pairwise mean comparisons for $m$ groups?

Each test involves two groups, so we require the number of ways of selecting two items out of $m$ . This is exactly the mathematical definition of a combination,

\binom{m}{2}=\frac{m!}{2!(m-2)!}=\frac{m!}{2(m-2)!},

since $2!=2$ .

Consider the case when you have just three groups. This would require three tests. If you had seven groups, this would become 21 tests. If you had ten groups then it would be 45 tests. However, it is not so difficult to write some code to automate these tests. The larger issue relates to the overall possibility of making an error in one, or more, of the tests.

How can we make an error when carrying out a hypothesis test? Suppose that we are testing

\displaystyle H_{0}:\mu=\mu_{0}

vs.

\displaystyle H_{1}:\mu\neq\mu_{0}

at the 5% level. We reject the null hypothesis if the absolute value of the test statistic

\displaystyle t=\frac{\bar{X}-\mu_{0}}{S/\sqrt{n}}

lies above the 97.5% quantile of the $t_{n-1}$ -distribution. Consequently there is a 5% probability of rejecting $H_{0}$ even when it is true.

Definition.

A Type I error occurs when the null hypothesis $H_{0}$ is rejected when it is in fact true. The probability of a Type I error is equal to the significance level $\alpha$ of the test.

There is a second type of error, which is less interesting for our purposes. This kind of error occurs if $H_{0}$ is accepted when $H_{1}$ is in fact true.

Definition.

A Type II error occurs when the null hypothesis $H_{0}$ is accepted when it is in fact not true.

The probability of a Type II error depends on the true value of the population parameter and can therefore only be calculated for speculated values of this parameter. For example, we might say ‘If the difference between the true mean and $\mu_{0}$ was $d$ for some $d>0$ , what would be the probability of a Type II error when testing $H_{0}:\mu=\mu_{0}$ against $H_{1}:\mu>\mu_{0}$ ’. The probability of a Type II error is linked to the power of the test.

Definition.

The power of the test is the probability of correctly accepting the alternative hypothesis $H_{1}$ , given that it is true. In other words, power is $1-\Pr[\text{Type II error}]$ .

There will be more investigation of Type I and Type II errors in the coursework and homework sheets. However, for now, we focus on an extension to the Type I error in the context of multiple testing. In particular, we also define the family-wise error rate:

Definition.

The family-wise error rate is the probability that $H_{0}$ is incorrectly rejected at least once across the whole series of tests.

When comparing the pairwise means of multiple groups, we would hope that the family-wise error rate would be equal to the probability of a Type I error for a single test. Unfortunately this is not the case. The probability that we incorrectly reject $H_{0}$ at least once across $k$ independent tests is the same as one minus the probability that we incorrectly reject $H_{0}$ in none of these tests. By definition, the probability that we incorrectly reject $H_{0}$ on any given test is $\alpha$ , then

	$\displaystyle\mathrm{FWER}$	$\displaystyle=1-\Pr[\text{none of the $k$ tests are incorrectly rejected}]$
		$\displaystyle=1-\Pr[\text{individual test is not incorrectly rejected}]^{k}$
		$\displaystyle=1-(1-\alpha)^{k}.$

If two independent tests are preformed at the 5% significance level, what is the FWER? How does this change if 10 tests are performed?

For two tests, the FWER is 0.0975. For 10 tests it is 0.401. That is the probability of incorrectly rejected $H_{0}$ in at least one of the ten tests is 40%. This is considerably higher that the 5% probability for a single test.

Whilst much research has, and is being, carried out into the issues surrounding multiple testing, in particular in health research and genomics, we shall now concentrate on a different approach to the comparison of the means of three or more groups.

Style control - access keys in brackets

5.1 Multiple t-tests

Definition.

Definition.

Definition.

Definition.

5.1 Multiple $t$ -tests