Home page for accesible maths 4.2 Difference of two proportions 4.2.2 Intervals and tests for

p_{1}-p_{2}

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

4.2.3 Hypothesis testing when $H_{0}:p_{1}=p_{2}$

Here we use a new example to examine a special estimate of standard error when $H_{0}:p_{1}=p_{2}$ . We investigate whether there is an increased risk of cancer in dogs that are exposed to the herbicide 2,4-dichlorophenoxyacetic acid (2,4-D). A study in 1994 examined 491 dogs that had developed cancer and 945 dogs as a control group.³⁸³⁸Hayes HM, Tarone RE, Cantor KP, Jessen CR, McCurnin DM, and Richardson RC. 1991. Case-Control Study of Canine Malignant Lymphoma: Positive Association With Dog Owner’s Use of 2, 4-Dichlorophenoxyacetic Acid Herbicides. Journal of the National Cancer Institute 83(17):1226-1231. Of these two groups, researchers identified which dogs had been exposed to 2,4-D in their owner’s yard. The results are shown in Table 4.2.

	cancer	no cancer
2,4-D	191	304
no 2,4-D	300	641

Table 4.2: Summary results for cancer in dogs and the use of 2,4-D by the dog’s owner.

Example 4.2.2

Is this study an experiment or an observational study?

Answer. The owners were not instructed to apply or not apply the herbicide, so this is an observational study. This question was especially tricky because one group was called the control group, which is a term usually seen in experiments.

Example 4.2.3

Set up hypotheses to test whether 2,4-D and the occurrence of cancer in dogs are related. Use a one-sided test and compare across the cancer and no cancer groups.

Answer. Using the proportions within the cancer and no cancer groups may seem odd. We intuitively may desire to compare the fraction of dogs with cancer in the 2,4-D and no 2,4-D groups, since the herbicide is an explanatory variable. However, the cancer rates in each group do not necessarily reflect the cancer rates in reality due to the way the data were collected. For this reason, computing cancer rates may greatly alarm dog owners.
$H_{0}$ : the proportion of dogs with exposure to 2,4-D is the same in ‘‘cancer’’ and ‘‘no cancer’’ dogs, $p_{c}-p_{n}=0$ .
$H_{A}$ : dogs with cancer are more likely to have been exposed to 2,4-D than dogs without cancer, $p_{c}-p_{n}>0$ .

Example 4.2.4

Are the conditions met to use the normal model and make inference on the results?

Answer. (1) It is unclear whether this is a random sample. However, if we believe the dogs in both the cancer and no cancer groups are representative of each respective population and that the dogs in the study do not interact in any way, then we may find it reasonable to assume independence between observations. (2) The success-failure condition holds for each sample.

Under the assumption of independence, we can use the normal model and make statements regarding the canine population based on the data.

In your hypotheses for Exercise 4.2.3, the null is that the proportion of dogs with exposure to 2,4-D is the same in each group. The point estimate of the difference in sample proportions is $\hat{p}_{c}-\hat{p}_{n}=0.067$ . To identify the p-value for this test, we first check conditions (Example 4.2.4) and compute the standard error of the difference:

SE=\sqrt{\frac{p_{c}(1-p_{c})}{n_{c}}+\frac{p_{n}(1-p_{n})}{n_{n}}}

In a hypothesis test, the distribution of the test statistic is always examined as though the null hypothesis is true, i.e. in this case, $p_{c}=p_{n}$ . The standard error formula should reflect this equality in the null hypothesis. We will use $p$ to represent the common rate of dogs that are exposed to 2,4-D in the two groups:

SE=\sqrt{\frac{p(1-p)}{n_{c}}+\frac{p(1-p)}{n_{n}}}

We don’t know the exposure rate, $p$ , but we can obtain a good estimate of it by pooling the results of both samples:

\hat{p}=\frac{\text{\# of ``successes''}}{\text{\# of cases}}=\frac{191+304}{1% 91+300+304+641}=0.345

This is called the pooled estimate of the sample proportion, and we use it to compute the standard error when the null hypothesis is that $p_{1}=p_{2}$ (e.g. $p_{c}=p_{n}$ or $p_{c}-p_{n}=0$ ). We also typically use it to verify the success-failure condition.

Pooled estimate of a proportion When the null hypothesis is $p_{1}=p_{2}$ , it is useful to find the pooled estimate of the shared proportion: $\displaystyle\hat{p}=\frac{\text{number of ``successes''}}{\text{number of % cases}}=\frac{\hat{p}_{1}n_{1}+\hat{p}_{2}n_{2}}{n_{1}+n_{2}}$ Here $\hat{p}_{1}n_{1}$ represents the number of successes in sample 1 since $\displaystyle\hat{p}_{1}=\frac{\text{number of successes in sample 1}}{n_{1}}$ Similarly, $\hat{p}_{2}n_{2}$ represents the number of successes in sample 2.

TIP: Use the pooled proportion estimate when $\mathbf{H_{0}:p_{1}=p_{2}}$ When the null hypothesis suggests the proportions are equal, we use the pooled proportion estimate ( $\hat{p}$ ) to verify the success-failure condition and also to estimate the standard error: $\displaystyle SE=\sqrt{\frac{\hat{p}(1-\hat{p})}{n_{1}}+\frac{\hat{p}(1-\hat{p% })}{n_{2}}}$ (4.3)

Example 4.2.5

Using Equation (4.3), $\hat{p}=0.345$ , $n_{1}=491$ , and $n_{2}=945$ , verify the estimate for the standard error is $SE=0.026$ . Next, complete the hypothesis test using a significance level of 0.05. Be certain to draw a picture, compute the p-value, and state your conclusion in both statistical language and plain language.

Answer. Compute the test statistic:

\displaystyle Z=\frac{\text{point estimate}-\text{null value}}{SE}=\frac{0.067% -0}{0.026}=2.58