Linear Mixed Models: Making Predictions and Evaluating Accuracy

In this post we show how to predict future measurement values in a longitudinal setting using linear mixed models (LMMs). We describe how to do it in R, and how to evaluate the accuracy, which requires somewhat careful handling.

The Problem and Dataset

In a previous post we went over the more methodological side of LMMs. However, we did not show how to do prediction of future values or evaluate those predictions. We will work with the aids dataset from the joineR package. This is a dataset of CD4 (white blood cell) count measurements along with death or censoring times as well as some covariates/treatment indicators. We will ignore the death/censoring times and focus on CD4 count measurement times (technically we should do a joint model, as the death/event times tell us something about what the unseen CD4 counts are close to those times, and thus the CD4 dynamics, but we will ignore death/censoring times for simplicity).

We start by loading packages and the dataset.

library(caret)#we will use this for cross validation
library(scorer)#we will use this to calculate mean absolute and squared error


The columns give the following information:

  • id: the id of the patient
  • time: the death or censoring time
  • death: whether the patient died while observed
  • CD4: the CD4 count
  • obstime: how many months from the first observation this is
  • drug: the treatment type
  • gender: male or female
  • prevOi: whether there was a previous infection
  • AZT: AZT intolerance or failure

Independence and Evaluation Metrics

We will want to construct confidence intervals for our accuracy metrics. To do so, we need to think about independence. We should first note that our within subjects measurements are not iid. We can assume that patients are all realizations of the same stochastic process and independent of each other though. However, CD4 measurements within a patient may depend on each other. Thus, we should average our accuracy metrics within subject, and use within-subject averages to construct confidence intervals and other relevant summary statistics. To start with, we perform k-fold cross validation and create k folds, as well as set up data structures to hold our within-subject mean absolute errors (MAEs) and mean squared errors (MSEs) for each patient, for both linear regression (LM) and a linear mixed model (LMM).

#we create cross validation folds using caret's createFolds
num_folds <- 10
#data structures to hold results
MAE_within_subjects_lm <- c()
MAE_within_subjects_lmm <- c()
MSE_within_subjects_lm <- c()
MSE_within_subjects_lmm <- c()

Next, for each fold, we train our model and make predictions. We will use all covariates along with a random intercept for the mixed model. We then save the our within-subjects MAEs and MSEs in the relevant data structures.

The Experiment

for(i in 1:num_folds){
  train_ids<- folds[[i]]
  #generate train/test
  #fit models
  aids_lm <- lm(formula=CD4~obstime+drug+gender+prevOI+AZT,data=train)
  aids_lmm <-lmer(formula=CD4~obstime+drug+gender+prevOI+AZT+(1|id),data=train)
  test_participants <- unique(test[['id']])
  for(participant in test_participants){
    prediction_subset <- subset(test,id==participant)
    y_pred_lm <- predict(aids_lm,newdata=prediction_subset)
    y_pred_lmm <- predict(aids_lmm,newdata=prediction_subset,
    y_true <- prediction_subset[['CD4']]
    MAE_within_subjects_lm<- c(MAE_within_subjects_lm,mean_absolute_error(y_pred_lm,y_true))
    MAE_within_subjects_lmm<- c(MAE_within_subjects_lmm,mean_absolute_error(y_pred_lmm,y_true))

Using these, we can make boxplots. We remove the outliers to improve clarity.

boxplot(MAE_within_subjects_lm,MAE_within_subjects_lm,main='MAE',names=c('Linear Model','Linear Mixed Model'),outline=FALSE)
boxplot(MSE_within_subjects_lm,MSE_within_subjects_lmm,main='MSE',names=c('Linear Model','Linear Mixed Model'),outline=FALSE)

Let’s look at 95\% confidence intervals for the mean.

LM(3.250611, 3.681545)(16.78123, 21.41477)
LMM(3.197386, 3.642805)(16.71764, 21.56956)

The linear mixed model performs better than the linear model on these two metrics, but just barely, and even without showing the two-sample hypothesis test we can tell that the difference is not significant. Why might this be?

Why Doesn’t the Linear Mixed Model do Better

Here are a few obvious reasons: we will focus on the first three

  • The covariates explain most of the patient specific variation
  • There is a lot of noise
  • something is wrong with our model specification for patient specific variation

We would like to test the first one. Let \nu be the variance of the random intercept. Then

H_1:\nu\neq 0

Note that 0 is on the boundary of the parameter space, which makes testing problematic. However, we can still look at the variance even if we don’t make a formal test. Let’s fit the mixed model on the whole dataset and investigate this.

aids_mixed_all <-lmer(formula=CD4~obstime+drug+gender+prevOI+AZT+(1|id),data=aids)

I’ll extract the relevant part of this summary:

Random effects:
 Groups   Name        Variance Std.Dev.
 id       (Intercept) 15.253   3.905   
 Residual              3.842   1.960   
Number of obs: 1405, groups:  id, 467

This variance is pretty large and seems unlikely to be due to chance. Now for the second issue (lots of noise), I don’t know of a formal hypothesis test, but we can look at both the residual standard error and the R^2 for the linear regression model with fixed effects only. That said, there are a lot of reasons why R^2 will be low or residual standard error will be high, many of which relate to various types of model misspecification rather than noise.

aids_lm_all <- lm(formula=CD4~obstime+drug+gender+prevOI+AZT,data=aids)

lm(formula = CD4 ~ obstime + drug + gender + prevOI + AZT, data = aids)

     Min       1Q   Median       3Q      Max 
-10.5414  -3.0200  -0.8382   2.5663  16.0378 

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 10.53580    0.45676  23.066  < 2e-16 ***
obstime     -0.08396    0.02525  -3.325 0.000905 ***
drugddI      0.50938    0.23679   2.151 0.031628 *  
gendermale  -0.68624    0.43070  -1.593 0.111314    
prevOIAIDS  -4.43438    0.29602 -14.980  < 2e-16 ***
AZTfailure  -0.15939    0.30194  -0.528 0.597656    
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.433 on 1399 degrees of freedom
 Multiple R-squared:  0.2036,    Adjusted R-squared:  0.2007 

We see that the R^2 is low. To evaluate the residual standard error, let’s look at the box plot for the CD4 values. Note that there is a problem here: CD4 values are likely not independent from each other, so this box plot will probably underestimate the quantiles distance from the median. However, it’s still somewhat interesting.

boxplot(aids[['CD4']],main='CD4 value box plot')

We can see that the lower quantile is close to 4 and the upper quantile is close to 10, so our residual standard error of 4.33 is quite large, even if the true quantiles are probably more spread out.

In conclusion, in this post, we showed how to do prediction with k-fold cross validation using linear mixed models. We constructed box plots and made confidence intervals for the mean for both within-subjects MAE and within-subjects MSE. We also investigated why the linear mixed model didn’t do much better than the fixed effects only linear model.

Leave a Reply

Your email address will not be published. Required fields are marked *