How are the sample statistics of the num_ char data set affected by the observation, 64,401? What would have happened if this email wasn’t observed? What would happen to these summary statistics if the observation at 64,401 had been even larger, say 150,000? These scenarios are plotted alongside the original data in Figure LABEL:email50NumCharDotPlotRobustEx, and sample statistics are computed under each scenario in Table 1.8.
R> dotPlot(email50[,14], at=3, pch=20, ylim=c(0.5,3.5), xlim=c(-3.5e1,151))
R> dotPlot(email50[-which.max(email50[,14]),14], at=2, pch=20, add=TRUE)
R> modemail=email50[,14];modemail[which.max(email50[,14])] = 150
R> dotPlot(modemail, at=1, pch=20, add=TRUE)
# Code to create summaries for table
R>d=email50[,14];median(d);diff(quantile(d,c(0.25,0.75)));mean(d);sd(d)
R>d=d[-which.max(d)];median(d);diff(quantile(d,c(0.25,0.75)));mean(d);sd(d)
R>median(modemail);diff(quantile(modemail,c(0.25,0.75)));mean(modemail);sd(modemail)
robust | not robust | |||
---|---|---|---|---|
scenario | median | IQR | ||
original num_ char data | 6,890 | 12,875 | 11,598 | 13,125 |
drop 66,924 observation | 6,768 | 11,702 | 10,521 | 10,798 |
move 66,924 to 150,000 | 6,890 | 12,875 | 13,310 | 22,434 |
(a) Which is more affected by extreme observations, the mean or median? Table 1.8 may be helpful. (b) Is the standard deviation or IQR more affected by extreme observations?
Answer. (a) Mean is affected more. (b) Standard deviation is affected more. Complete explanations are provided in the material following Exercise 1.6.17. The median and IQR are called robust estimates because extreme observations have little effect on their values. The mean and standard deviation are much more affected by changes in extreme observations.
The median and IQR do not change much under the three scenarios in Table 1.8. Why might this be the case?
Answer. The median and IQR are only sensitive to numbers near , the median, and . Since values in these regions are relatively stable – there aren’t large jumps between observations – the median and IQR estimates are also quite stable.
The distribution of vehicle prices tends to be right skewed, with a few luxury and sports cars lingering out into the right tail. If you were searching for a new car and cared about price, should you be more interested in the mean or median price of vehicles sold, assuming you are in the market for a regular car?
Answer. Buyers of a ‘‘regular car’’ should be concerned about the median price. High-end car sales can drastically inflate the mean price while the median will be more robust to the influence of those sales.