Home page for accesible maths 3.1 Paired data 3.1.1 Paired observations and samples 3.2 Difference of two means

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

3.1.2 Inference for paired data

To analyse a paired data set, we use the exact same tools that we developed in Chapter 2.6 except now we apply them to the differences in the paired observations.

$n_{{}_{diff}}$	$\bar{x}_{{}_{diff}}$	$s_{{}_{diff}}$
73	12.76	14.26

Table 3.2: Summary statistics for the price differences. There were 73 books, so there are 73 differences.

Example 3.1.1

Set up and implement a hypothesis test to determine whether, on average, there is a difference between Amazon’s price for a book and the UCLA bookstore’s price.

Answer. There are two scenarios: there is no difference or there is some difference in average prices. The no difference scenario is always the null hypothesis:

$H_{0}$ :

$\mu_{diff}=0$ . There is no difference in the average textbook price.
$H_{A}$ :

$\mu_{diff}\neq 0$ . There is a difference in average prices.

Can the normal model be used to describe the sampling distribution of $\bar{x}_{diff}$ ? We must check that the differences meet the conditions established in Chapter 2.6. The observations are based on a simple random sample from less than 10% of all books sold at the bookstore, so independence is reasonable; there are more than 30 differences; and the distribution of differences, shown in Figure LABEL:diffInTextbookPricesS10, is strongly skewed, but this amount of skew is reasonable for this sized data set ( $n=73$ ). Because all three conditions are reasonably satisfied, we can conclude the sampling distribution of $\bar{x}_{diff}$ is nearly normal and our estimate of the standard error will be reasonable.

We compute the standard error associated with $\bar{x}_{diff}$ using the standard deviation of the differences ( $s_{{}_{diff}}=14.26$ ) and the number of differences ( $n_{{}_{diff}}=73$ ):

SE_{\bar{x}_{diff}}=\frac{s_{diff}}{\sqrt{n_{diff}}}=\frac{14.26}{\sqrt{73}}=1% .67

To visualize the p-value, the sampling distribution of $\bar{x}_{diff}$ is drawn as though $H_{0}$ is true, which is shown in Figure LABEL:textbooksS10HTTails. The p-value is represented by the two (very) small tails.

To find the tail areas, we compute the test statistic, which is the Z score of $\bar{x}_{diff}$ under the null condition that the actual mean difference is 0:

Z=\frac{\bar{x}_{diff}-0}{SE_{x_{diff}}}=\frac{12.76-0}{1.67}=7.59

This Z score is very large and pnorm(7.59)= $\mathbb{P}(Z<7.59)=1$ , which ensures the single tail area will be 0. Since the p-value corresponds to both tails in this case and the normal distribution is symmetric, the p-value can be estimated as twice the one-tail area:

\text{p-value}=2\times(\text{one tail area})\approx 2\times 0=0

Because the p-value is less than 0.05, we reject the null hypothesis. We have found convincing evidence that Amazon is, on average, cheaper than the UCLA bookstore for UCLA course textbooks.

Example 3.1.2

Create a 95% confidence interval for the average price difference between books at the UCLA bookstore and books on Amazon.

Answer. Conditions have already verified and the standard error computed in Example 3.1.1. To find the interval, identify $z^{\star}$ (1.96 for 95% confidence) and plug it, the point estimate, and the standard error into the confidence interval formula:

\text{point estimate}\ \pm\ z^{\star}SE\quad\to\quad 12.76\ \pm\ 1.96\times 1.% 67\quad\to\quad(9.49,16.03)

We are 95% confident that Amazon is, on average, between $9.49 and $16.03 cheaper than the UCLA bookstore for UCLA course books.