Home page for accesible maths 4.2 Difference of two proportions

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

4.2.3 Hypothesis testing when H0:p1=p2

Here we use a new example to examine a special estimate of standard error when H0:p1=p2. We investigate whether there is an increased risk of cancer in dogs that are exposed to the herbicide 2,4-dichlorophenoxyacetic acid (2,4-D). A study in 1994 examined 491 dogs that had developed cancer and 945 dogs as a control group.3838Hayes HM, Tarone RE, Cantor KP, Jessen CR, McCurnin DM, and Richardson RC. 1991. Case-Control Study of Canine Malignant Lymphoma: Positive Association With Dog Owner’s Use of 2, 4-Dichlorophenoxyacetic Acid Herbicides. Journal of the National Cancer Institute 83(17):1226-1231. Of these two groups, researchers identified which dogs had been exposed to 2,4-D in their owner’s yard. The results are shown in Table 4.2.

cancer no cancer
2,4-D 191 304
no 2,4-D 300 641
Table 4.2: Summary results for cancer in dogs and the use of 2,4-D by the dog’s owner.
Example 4.2.2

Is this study an experiment or an observational study?

Answer. The owners were not instructed to apply or not apply the herbicide, so this is an observational study. This question was especially tricky because one group was called the control group, which is a term usually seen in experiments.

Example 4.2.3

Set up hypotheses to test whether 2,4-D and the occurrence of cancer in dogs are related. Use a one-sided test and compare across the cancer and no cancer groups.

Answer. Using the proportions within the cancer and no cancer groups may seem odd. We intuitively may desire to compare the fraction of dogs with cancer in the 2,4-D and no 2,4-D groups, since the herbicide is an explanatory variable. However, the cancer rates in each group do not necessarily reflect the cancer rates in reality due to the way the data were collected. For this reason, computing cancer rates may greatly alarm dog owners.
H0: the proportion of dogs with exposure to 2,4-D is the same in ‘‘cancer’’ and ‘‘no cancer’’ dogs, pc-pn=0.
HA: dogs with cancer are more likely to have been exposed to 2,4-D than dogs without cancer, pc-pn>0.

Example 4.2.4

Are the conditions met to use the normal model and make inference on the results?

Answer. (1) It is unclear whether this is a random sample. However, if we believe the dogs in both the cancer and no cancer groups are representative of each respective population and that the dogs in the study do not interact in any way, then we may find it reasonable to assume independence between observations. (2) The success-failure condition holds for each sample.

Under the assumption of independence, we can use the normal model and make statements regarding the canine population based on the data.

In your hypotheses for Exercise 4.2.3, the null is that the proportion of dogs with exposure to 2,4-D is the same in each group. The point estimate of the difference in sample proportions is p^c-p^n=0.067. To identify the p-value for this test, we first check conditions (Example 4.2.4) and compute the standard error of the difference:

SE=pc(1-pc)nc+pn(1-pn)nn

In a hypothesis test, the distribution of the test statistic is always examined as though the null hypothesis is true, i.e. in this case, pc=pn. The standard error formula should reflect this equality in the null hypothesis. We will use p to represent the common rate of dogs that are exposed to 2,4-D in the two groups:

SE=p(1-p)nc+p(1-p)nn

We don’t know the exposure rate, p, but we can obtain a good estimate of it by pooling the results of both samples:

p^=# of ‘‘successes’’# of cases=191+304191+300+304+641=0.345

This is called the pooled estimate of the sample proportion, and we use it to compute the standard error when the null hypothesis is that p1=p2 (e.g. pc=pn or pc-pn=0). We also typically use it to verify the success-failure condition.



Pooled estimate of a proportion When the null hypothesis is p1=p2, it is useful to find the pooled estimate of the shared proportion: p^=number of ‘‘successes’’number of cases=p^1n1+p^2n2n1+n2 Here p^1n1 represents the number of successes in sample 1 since p^1=number of successes in sample 1n1 Similarly, p^2n2 represents the number of successes in sample 2.



TIP: Use the pooled proportion estimate when 𝐇:𝐩𝟏=𝐩𝟐 When the null hypothesis suggests the proportions are equal, we use the pooled proportion estimate (p^) to verify the success-failure condition and also to estimate the standard error: SE=p^(1-p^)n1+p^(1-p^)n2 (4.3)

Example 4.2.5

Using Equation (4.3), p^=0.345, n1=491, and n2=945, verify the estimate for the standard error is SE=0.026. Next, complete the hypothesis test using a significance level of 0.05. Be certain to draw a picture, compute the p-value, and state your conclusion in both statistical language and plain language.

Answer. Compute the test statistic:

Z=point estimate-null valueSE=0.067-00.026=2.58

We leave the picture to you. Looking up Z=2.58 in R pnorm(2.58) = 0.99506. However this is the lower tail, and the upper tail represents the p-value: 1-0.99506=0.00494. We reject the null hypothesis and conclude that dogs getting cancer and owners using 2,4-D are associated.