Linear Mixed Models: Making Predictions and Evaluating Accuracy

Contents hide

4 Why Doesn’t the Linear Mixed Model do Better

In this post we show how to predict future measurement values in a longitudinal setting using linear mixed models (LMMs). We describe how to do it in R, and how to evaluate the accuracy, which requires somewhat careful handling.

The Problem and Dataset

In a previous post we went over the more methodological side of LMMs. However, we did not show how to do prediction of future values or evaluate those predictions. We will work with the aids dataset from the joineR package. This is a dataset of CD4 (white blood cell) count measurements along with death or censoring times as well as some covariates/treatment indicators. We will ignore the death/censoring times and focus on CD4 count measurement times (technically we should do a joint model, as the death/event times tell us something about what the unseen CD4 counts are close to those times, and thus the CD4 dynamics, but we will ignore death/censoring times for simplicity).

We start by loading packages and the dataset.

library(joineR)
library(lme4)
library(splines)
library(caret)#we will use this for cross validation
library(scorer)#we will use this to calculate mean absolute and squared error

data(aids)
head(aids)

The columns give the following information:

id: the id of the patient
time: the death or censoring time
death: whether the patient died while observed
CD4: the CD4 count
obstime: how many months from the first observation this is
drug: the treatment type
gender: male or female
prevOi: whether there was a previous infection
AZT: AZT intolerance or failure

Independence and Evaluation Metrics

We will want to construct confidence intervals for our accuracy metrics. To do so, we need to think about independence. We should first note that our within subjects measurements are not iid. We can assume that patients are all realizations of the same stochastic process and independent of each other though. However, CD4 measurements within a patient may depend on each other. Thus, we should average our accuracy metrics within subject, and use within-subject averages to construct confidence intervals and other relevant summary statistics. To start with, we perform k-fold cross validation and create k folds, as well as set up data structures to hold our within-subject mean absolute errors (MAEs) and mean squared errors (MSEs) for each patient, for both linear regression (LM) and a linear mixed model (LMM).

all_participants<-unique(aids[['id']])
#we create cross validation folds using caret's createFolds
num_folds <- 10
folds<-createFolds(all_participants,k=num_folds,list=TRUE,returnTrain=TRUE)
#data structures to hold results
MAE_within_subjects_lm <- c()
MAE_within_subjects_lmm <- c()
MSE_within_subjects_lm <- c()
MSE_within_subjects_lmm <- c()

Next, for each fold, we train our model and make predictions. We will use all covariates along with a random intercept for the mixed model. We then save the our within-subjects MAEs and MSEs in the relevant data structures.

The Experiment

for(i in 1:num_folds){
  train_ids<- folds[[i]]
  test_ids<-all_participants[!(all_participants%in%folds[[i]])]
  #generate train/test
  train<-subset(aids,id%in%train_ids)
  test<-subset(aids,id%in%test_ids)
  #fit models
  aids_lm <- lm(formula=CD4~obstime+drug+gender+prevOI+AZT,data=train)
  aids_lmm <-lmer(formula=CD4~obstime+drug+gender+prevOI+AZT+(1|id),data=train)
  test_participants <- unique(test[['id']])
  for(participant in test_participants){
    prediction_subset <- subset(test,id==participant)
    #predict
    y_pred_lm <- predict(aids_lm,newdata=prediction_subset)
    y_pred_lmm <- predict(aids_lmm,newdata=prediction_subset,allow.new.levels=TRUE)
    y_true <- prediction_subset[['CD4']]
    MAE_within_subjects_lm<- c(MAE_within_subjects_lm,mean_absolute_error(y_pred_lm,y_true))
    MAE_within_subjects_lmm<- c(MAE_within_subjects_lmm,mean_absolute_error(y_pred_lmm,y_true))
    MSE_within_subjects_lm<-c(MSE_within_subjects_lm,mean_squared_error(y_pred_lm,y_true))
    MSE_within_subjects_lmm<-c(MSE_within_subjects_lmm,mean_squared_error(y_pred_lmm,y_true))
  }
}

Using these, we can make boxplots. We remove the outliers to improve clarity.

boxplot(MAE_within_subjects_lm,MAE_within_subjects_lm,main='MAE',names=c('Linear Model','Linear Mixed Model'),outline=FALSE)
boxplot(MSE_within_subjects_lm,MSE_within_subjects_lmm,main='MSE',names=c('Linear Model','Linear Mixed Model'),outline=FALSE)

Let’s look at $95\%$ confidence intervals for the mean.

t.test(MAE_within_subjects_lm)
t.test(MAE_within_subjects_lmm)
t.test(MSE_within_subjects_lm)
t.test(MSE_within_subjects_lmm)

	MAE	MSE
LM	(3.250611, 3.681545)	(16.78123, 21.41477)
LMM	(3.197386, 3.642805)	(16.71764, 21.56956)

The linear mixed model performs better than the linear model on these two metrics, but just barely, and even without showing the two-sample hypothesis test we can tell that the difference is not significant. Why might this be?

Why Doesn’t the Linear Mixed Model do Better

Here are a few obvious reasons: we will focus on the first three

The covariates explain most of the patient specific variation
There is a lot of noise
something is wrong with our model specification for patient specific variation

We would like to test the first one. Let $\nu$ be the variance of the random intercept. Then

$H_0:\nu=0$
$H_1:\nu\neq 0$

Note that $0$ is on the boundary of the parameter space, which makes testing problematic. However, we can still look at the variance even if we don’t make a formal test. Let’s fit the mixed model on the whole dataset and investigate this.

aids_mixed_all <-lmer(formula=CD4~obstime+drug+gender+prevOI+AZT+(1|id),data=aids)
summary(aids_mixed_all)

I’ll extract the relevant part of this summary:

Random effects:
 Groups   Name        Variance Std.Dev.
 id       (Intercept) 15.253   3.905   
 Residual              3.842   1.960   
Number of obs: 1405, groups:  id, 467

This variance is pretty large and seems unlikely to be due to chance. Now for the second issue (lots of noise), I don’t know of a formal hypothesis test, but we can look at both the residual standard error and the $R^2$ for the linear regression model with fixed effects only. That said, there are a lot of reasons why $R^2$ will be low or residual standard error will be high, many of which relate to various types of model misspecification rather than noise.

aids_lm_all <- lm(formula=CD4~obstime+drug+gender+prevOI+AZT,data=aids)
summary(aids_lm_all)

Call:
lm(formula = CD4 ~ obstime + drug + gender + prevOI + AZT, data = aids)

Residuals:
     Min       1Q   Median       3Q      Max 
-10.5414  -3.0200  -0.8382   2.5663  16.0378 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 10.53580    0.45676  23.066  < 2e-16 ***
obstime     -0.08396    0.02525  -3.325 0.000905 ***
drugddI      0.50938    0.23679   2.151 0.031628 *  
gendermale  -0.68624    0.43070  -1.593 0.111314    
prevOIAIDS  -4.43438    0.29602 -14.980  < 2e-16 ***
AZTfailure  -0.15939    0.30194  -0.528 0.597656    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.433 on 1399 degrees of freedom
 Multiple R-squared:  0.2036,    Adjusted R-squared:  0.2007

We see that the $R^2$ is low. To evaluate the residual standard error, let’s look at the box plot for the CD4 values. Note that there is a problem here: CD4 values are likely not independent from each other, so this box plot will probably underestimate the quantiles distance from the median. However, it’s still somewhat interesting.

boxplot(aids[['CD4']],main='CD4 value box plot')

We can see that the lower quantile is close to $4$ and the upper quantile is close to $10$ , so our residual standard error of $4.33$ is quite large, even if the true quantiles are probably more spread out.

In conclusion, in this post, we showed how to do prediction with k-fold cross validation using linear mixed models. We constructed box plots and made confidence intervals for the mean for both within-subjects MAE and within-subjects MSE. We also investigated why the linear mixed model didn’t do much better than the fixed effects only linear model.

2 Replies to “Linear Mixed Models: Making Predictions and Evaluating Accuracy”

Cassie Stevenson says:
February 3, 2022 at 12:47 am
Hi there,
I am trying to adapt this code to assess the accuracy of a logistic mixed model. I am trying to assess whether this code is useful to run a confusion matrix, but I am not sure that’s it’s actually doing what I hope it’s doing and I having trouble understanding what the code is doing in line with notes after the ## below. I have a few more questions, but I thought I would keep it simple in case elicits a faster response, but if there is someone who could clarify further, that would be greatly appreciated. Thanks so much for your time.
for(i in 1:num_folds){
train_ids<- folds[[i]]
test_ids<-all_participants[!(all_participants%in%folds[[i]])] ## is this leaving out one participant? Or testing for EACH participant?
#generate train/test
train<-subset(aids,id%in%train_ids)
test<-subset(aids,id%in%test_ids)
#fit models
aids_lm <- lm(formula=CD4~obstime+drug+gender+prevOI+AZT,data=train)
aids_lmm <-lmer(formula=CD4~obstime+drug+gender+prevOI+AZT+(1|id),data=train)
test_participants <- unique(test[['id']])
for(participant in test_participants){
prediction_subset <- subset(test,id==participant)
#predict
y_pred_lm <- predict(aids_lm,newdata=prediction_subset)
y_pred_lmm <- predict(aids_lmm,newdata=prediction_subset,allow.new.levels=TRUE)
y_true <- prediction_subset[['CD4']]
MAE_within_subjects_lm<- c(MAE_within_subjects_lm,mean_absolute_error(y_pred_lm,y_true))
MAE_within_subjects_lmm<- c(MAE_within_subjects_lmm,mean_absolute_error(y_pred_lmm,y_true))
MSE_within_subjects_lm<-c(MSE_within_subjects_lm,mean_squared_error(y_pred_lm,y_true))
MSE_within_subjects_lmm<-c(MSE_within_subjects_lmm,mean_squared_error(y_pred_lmm,y_true))
}
}
Reply
1. Alex says:
  February 6, 2022 at 8:33 pm
  Hi Cassie,
  That part holds out all the participants who are not in the current fold to create the test set, trains on everyone else, and then predicts on the test set.
  Best,
  Alex
  Reply

The Problem and Dataset

Independence and Evaluation Metrics

The Experiment

Why Doesn’t the Linear Mixed Model do Better

Related Posts

2 Replies to “Linear Mixed Models: Making Predictions and Evaluating Accuracy”

Leave a Reply Cancel reply