TEMPORARY_DOCUMENT_ID 2 Defining Categorical Variables

1 Defining and interpreting model with interaction

Consider the application survey data set APP available from MOODLE that can be loaded in R by:

APP <- read.table("APP.dat")

The data set contains 480 responses from users of three mobile applications

•

Recommend – The Yes of No response to the question “Would you recommend the use of the application to your friends?”
•

App – Which application used by the consumer, App1, App2 or App3.
•

Stars – The number stars given by the user out of five.
•

Usage – The number of hours the consumer has used the application.

Recall from Chapter 2 that we examined this data set and evaluated the log-odds ratios. In brief, this can be evaluated in R as follows:

TAB <- table(APP$Recommend, APP$App)
TAB
      App1 App2 App3
  No    21   50  119
  Yes   75   94  121

From this, we can estimate the log-odds and log-odds ratio as:

Log_odds <- log(TAB[2, ] / TAB[1, ])
round(Log_odds, 4)   ##App1: 1.2730   App2: 0.6313   App3: 0.0167

Log_odds_ratio <- Log_odds[2:3] - Log_odds[1]
round(Log_odds_ratio, 4)   ##App2: -0.6417   App3: -1.2563

Let us now fit a logistic regression model for the customer’s responses to the survey question with respect to which mobile application they are using. For this, we propose and fit the following glm:

M0 <- glm(Recommend ~ 1 + App, family = binomial, data = APP)
M0

Coefficients:
(Intercept)      AppApp2      AppApp3
     1.2730      -0.6417      -1.2563

Compare the co-efficients to the log-odds and log-odds ratio derived above. Define what the linear predictor is for this model where $a^{(j)}_{i}$ is an indicator variable that is $1$ if customer $i$ uses application $j$ (for $j\in\{\mathrm{App1},\mathrm{App2},\mathrm{App3}\}$ ), or $0$ otherwise.

There is no restriction to how many explanatory variables we may include in defining the linear predictor. For example, we may wish to include the numerical Usage variable to the linear predictor in proposing the additive model:

M1 <- glm(Recommend ~ 1 + App + Usage, family = binomial, data = APP)
M1

Coefficients:
(Intercept)      AppApp2      AppApp3        Usage
    -0.1532      -0.4793      -1.1384       0.6968

Note that the log-odds (ratio) estimates for the apps are different to the previous model. This happens because the presented estimates here are Usage-corrected with respect to the amount the applications have been used; in other words, what is the log-odds (ratio) for the apps when Usage = 0.

To define an interaction between two explanatory variables we use the character : in the model formula to specify which pair of variables are interacting. For example, the following model specifies the linear predictor that only contains the interaction terms between App and Usage explanatory variable:

M2 <- glm(Recommend ~ App:Usage, family = binomial, data = APP)
M2

Coefficients:
  (Intercept)  AppApp1:Usage  AppApp2:Usage  AppApp3:Usage
      -0.9663         1.1669         0.8786         0.5558

In this case, the interaction terms describes how the log-odds for recommendation changes per mobile application for every unit increase (1 hour) in usage.

Alternatively, the full interaction model with both additive and interaction components can be specified by instead using the * character between the two explanatory variables.

M3 <- glm(Recommend ~ App*Usage, family = binomial, data = APP)
M3

Coefficients:
  (Intercept)        AppApp2        AppApp3          Usage  AppApp2:Usage
      -0.3890        -0.3312        -0.8138         0.8708        -0.1059
AppApp3:Usage
      -0.2337

Note that this model could equivalently be specified in full as:

M3_alt <- glm(Recommend ~ 1 + App + Usage + App:Usage,
    family = binomial, data = APP)