Home page for accesible maths 1.2 Data basics

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

1.2.2 Types of variables

Examine the fed_ spend, pop2010, state, and smoking_ ban variables in the county data set. Each of these variables is inherently different from the other three yet many of them share certain characteristics.

First consider fed_ spend, which is said to be a numerical variable since it can take a wide range of numerical values, and it is sensible to add, subtract, or take averages with those values. On the other hand, we would not classify a variable reporting telephone area codes as numerical since their average, sum, and difference have no clear meaning.

The pop2010 variable is also numerical, although it seems to be a little different than fed_ spend. This variable of the population count can only take whole non-negative numbers (0, 1, 2, …). For this reason, the population variable is said to be discrete since it can only take numerical values with jumps. On the other hand, the federal spending variable is said to be continuous.

The variable state can take up to 51 values after accounting for Washington, DC: AL, …, and WY. Because the responses themselves are categories, state is called a categorical variable,55Sometimes also called a nominal variable. and the possible values are called the variable’s levels.

Finally, consider the smoking_ ban variable, which describes the type of county-wide smoking ban and takes values none, partial, or comprehensive in each county. This variable seems to be a hybrid: it is a categorical variable but the levels have a natural ordering. A variable with these properties is called an ordinal variable. To simplify analyses, any ordinal variables in this course will be treated as categorical variables.

Example 1.2.2

Data were collected about students in a statistics course. Three variables were recorded for each student: number of siblings, student height, and whether the student had previously taken a statistics course. Classify each of the variables as continuous numerical, discrete numerical, or categorical.

Answer. The number of siblings and student height represent numerical variables. Because the number of siblings is a count, it is discrete. Height varies continuously, so it is a continuous numerical variable. The last variable classifies students into two categories – those who have and those who have not taken a statistics course – which makes this variable categorical.

Example 1.2.3

Consider the variables group and outcome (at 30 days) from the stent study in Section 1.1. Are these numerical or categorical variables?

Answer. There are only two possible values for each variable, and in both cases they describe categories. Thus, each is categorical variables.