Home page for accesible maths 1.6 Examining numerical data

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

1.6.1 Scatterplots for paired data

A scatterplot provides a case-by-case view of data for two numerical variables. In Figure LABEL:county_fed_spendVsPoverty, a scatterplot was used to examine how federal spending and poverty were related in the county data set. Another scatterplot is shown in Figure LABEL:email50LinesCharacters, comparing the number of line breaks (line_ breaks) and number of characters (num_ char) in emails for the email50 data set. In any scatterplot, each point represents a single case. Since there are 50 cases in email50, there are 50 points in Figure LABEL:email50LinesCharacters.

R> data(email50)
R> plot(email50[,14], email50[,15], pch=19)

To put the number of characters in perspective, this paragraph has 363 characters. Looking at Figure LABEL:email50LinesCharacters, it seems that some emails are incredibly verbose! Upon further investigation, we would actually find that most of the long emails use the HTML format, which means most of the characters in those emails are used to format the email rather than provide text.

Example 1.6.1

What do scatterplots reveal about the data, and how might they be useful?

Answer. Answers may vary. Scatterplots are helpful in quickly spotting associations relating variables, whether those associations come in the form of simple trends or whether those relationships are more complex.

Example 1.6.2

Consider a new data set of 54 cars with two variables: vehicle price and weight.1212Subset of data from http://www.amstat.org/publications/jse/v1n1/datasets.lock.html A scatterplot of vehicle price versus weight is shown in Figure LABEL:carsPriceVsWeight. What can be said about the relationship between these variables?

R> data(cars)
R> plot(cars[,6], cars[,2], pch=19,ylim=c(0, max(cars[,2])))

Answer. The relationship is evidently nonlinear, as highlighted by the dashed line. This is different from previous scatterplots we’ve seen, such as Figure LABEL:county_fed_spendVsPoverty and Figure LABEL:email50LinesCharacters, which show relationships that are very linear.

Example 1.6.3

Describe two variables that would have a horseshoe shaped association in a scatterplot.

Answer. Consider the case where your vertical axis represents something ‘‘good’’ and your horizontal axis represents something that is only good in moderation. Health and water consumption fit this description since water becomes toxic when consumed in excessive quantities.