Consider the application survey data set APP available from MOODLE that can be loaded in R by:
APP <- read.table("APP.dat")
The data set contains 480 responses from users of three mobile applications
Recommend – The Yes of No response to the question “Would you recommend the use of the application to your friends?”
App – Which application used by the consumer, App1, App2 or App3.
Stars – The number stars given by the user out of five.
Usage – The number of hours the consumer has used the application.
Recall from Chapter 2 that we examined this data set and evaluated the log-odds ratios. In brief, this can be evaluated in R as follows:
TAB <- table(APP$Recommend, APP$App) TAB App1 App2 App3 No 21 50 119 Yes 75 94 121
From this, we can estimate the log-odds and log-odds ratio as:
Log_odds <- log(TAB[2, ] / TAB[1, ]) round(Log_odds, 4) ##App1: 1.2730 App2: 0.6313 App3: 0.0167 Log_odds_ratio <- Log_odds[2:3] - Log_odds[1] round(Log_odds_ratio, 4) ##App2: -0.6417 App3: -1.2563
Let us now fit a logistic regression model for the customer’s responses to the survey question with respect to which mobile application they are using. For this, we propose and fit the following glm:
M0 <- glm(Recommend ~ 1 + App, family = binomial, data = APP) M0 Coefficients: (Intercept) AppApp2 AppApp3 1.2730 -0.6417 -1.2563
Compare the co-efficients to the log-odds and log-odds ratio derived above. Define what the linear predictor is for this model where is an indicator variable that is if customer uses application (for ), or otherwise.
There is no restriction to how many explanatory variables we may include in defining the linear predictor. For example, we may wish to include the numerical Usage variable to the linear predictor in proposing the additive model:
M1 <- glm(Recommend ~ 1 + App + Usage, family = binomial, data = APP) M1 Coefficients: (Intercept) AppApp2 AppApp3 Usage -0.1532 -0.4793 -1.1384 0.6968
Note that the log-odds (ratio) estimates for the apps are different to the previous model. This happens because the presented estimates here are Usage-corrected with respect to the amount the applications have been used; in other words, what is the log-odds (ratio) for the apps when Usage = 0.
To define an interaction between two explanatory variables we use the character : in the model formula to specify which pair of variables are interacting. For example, the following model specifies the linear predictor that only contains the interaction terms between App and Usage explanatory variable:
M2 <- glm(Recommend ~ App:Usage, family = binomial, data = APP) M2 Coefficients: (Intercept) AppApp1:Usage AppApp2:Usage AppApp3:Usage -0.9663 1.1669 0.8786 0.5558
In this case, the interaction terms describes how the log-odds for recommendation changes per mobile application for every unit increase (1 hour) in usage.
Alternatively, the full interaction model with both additive and interaction components can be specified by instead using the * character between the two explanatory variables.
M3 <- glm(Recommend ~ App*Usage, family = binomial, data = APP) M3 Coefficients: (Intercept) AppApp2 AppApp3 Usage AppApp2:Usage -0.3890 -0.3312 -0.8138 0.8708 -0.1059 AppApp3:Usage -0.2337
Note that this model could equivalently be specified in full as:
M3_alt <- glm(Recommend ~ 1 + App + Usage + App:Usage, family = binomial, data = APP)