In this post we show how to predict future measurement values in a longitudinal setting using linear mixed models (LMMs). We describe how to do it in R, and how to evaluate the accuracy, which requires somewhat careful handling.

## The Problem and Dataset

In a previous post we went over the more methodological side of LMMs. However, we did not show how to do prediction of future values or evaluate those predictions. We will work with the aids dataset from the joineR package. This is a dataset of CD4 (white blood cell) count measurements along with death or censoring times as well as some covariates/treatment indicators. We will ignore the death/censoring times and focus on CD4 count measurement times (technically we should do a joint model, as the death/event times tell us something about what the unseen CD4 counts are close to those times, and thus the CD4 dynamics, but we will ignore death/censoring times for simplicity).

We start by loading packages and the dataset.

```
library(joineR)
library(lme4)
library(splines)
library(caret)#we will use this for cross validation
library(scorer)#we will use this to calculate mean absolute and squared error
data(aids)
head(aids)
```

The columns give the following information:

- id: the id of the patient
- time: the death or censoring time
- death: whether the patient died while observed
- CD4: the CD4 count
- obstime: how many months from the first observation this is
- drug: the treatment type
- gender: male or female
- prevOi: whether there was a previous infection
- AZT: AZT intolerance or failure

## Independence and Evaluation Metrics

We will want to construct confidence intervals for our accuracy metrics. To do so, we need to think about independence. We should first note that our within subjects measurements are not iid. We can assume that patients are all realizations of the same stochastic process and independent of each other though. However, CD4 measurements within a patient may depend on each other. Thus, we should average our accuracy metrics within subject, and use within-subject averages to construct confidence intervals and other relevant summary statistics. To start with, we perform k-fold cross validation and create k folds, as well as set up data structures to hold our within-subject mean absolute errors (MAEs) and mean squared errors (MSEs) for each patient, for both linear regression (LM) and a linear mixed model (LMM).

```
all_participants<-unique(aids[['id']])
#we create cross validation folds using caret's createFolds
num_folds <- 10
folds<-createFolds(all_participants,k=num_folds,list=TRUE,returnTrain=TRUE)
#data structures to hold results
MAE_within_subjects_lm <- c()
MAE_within_subjects_lmm <- c()
MSE_within_subjects_lm <- c()
MSE_within_subjects_lmm <- c()
```

Next, for each fold, we train our model and make predictions. We will use all covariates along with a random intercept for the mixed model. We then save the our within-subjects MAEs and MSEs in the relevant data structures.

## The Experiment

```
for(i in 1:num_folds){
train_ids<- folds[[i]]
test_ids<-all_participants[!(all_participants%in%folds[[i]])]
#generate train/test
train<-subset(aids,id%in%train_ids)
test<-subset(aids,id%in%test_ids)
#fit models
aids_lm <- lm(formula=CD4~obstime+drug+gender+prevOI+AZT,data=train)
aids_lmm <-lmer(formula=CD4~obstime+drug+gender+prevOI+AZT+(1|id),data=train)
test_participants <- unique(test[['id']])
for(participant in test_participants){
prediction_subset <- subset(test,id==participant)
#predict
y_pred_lm <- predict(aids_lm,newdata=prediction_subset)
y_pred_lmm <- predict(aids_lmm,newdata=prediction_subset,allow.new.levels=TRUE)
y_true <- prediction_subset[['CD4']]
MAE_within_subjects_lm<- c(MAE_within_subjects_lm,mean_absolute_error(y_pred_lm,y_true))
MAE_within_subjects_lmm<- c(MAE_within_subjects_lmm,mean_absolute_error(y_pred_lmm,y_true))
MSE_within_subjects_lm<-c(MSE_within_subjects_lm,mean_squared_error(y_pred_lm,y_true))
MSE_within_subjects_lmm<-c(MSE_within_subjects_lmm,mean_squared_error(y_pred_lmm,y_true))
}
}
```

Using these, we can make boxplots. We remove the outliers to improve clarity.

```
boxplot(MAE_within_subjects_lm,MAE_within_subjects_lm,main='MAE',names=c('Linear Model','Linear Mixed Model'),outline=FALSE)
boxplot(MSE_within_subjects_lm,MSE_within_subjects_lmm,main='MSE',names=c('Linear Model','Linear Mixed Model'),outline=FALSE)
```

Let’s look at confidence intervals for the mean.

```
t.test(MAE_within_subjects_lm)
t.test(MAE_within_subjects_lmm)
t.test(MSE_within_subjects_lm)
t.test(MSE_within_subjects_lmm)
```

MAE | MSE | |

LM | (3.250611, 3.681545) | (16.78123, 21.41477) |

LMM | (3.197386, 3.642805) | (16.71764, 21.56956) |

The linear mixed model performs better than the linear model on these two metrics, but just barely, and even without showing the two-sample hypothesis test we can tell that the difference is not significant. Why might this be?

## Why Doesn’t the Linear Mixed Model do Better

Here are a few obvious reasons: we will focus on the first three

- The covariates explain most of the patient specific variation
- There is a lot of noise
- something is wrong with our model specification for patient specific variation

We would like to test the first one. Let be the variance of the random intercept. Then

Note that is on the boundary of the parameter space, which makes testing problematic. However, we can still look at the variance even if we don’t make a formal test. Let’s fit the mixed model on the whole dataset and investigate this.

```
aids_mixed_all <-lmer(formula=CD4~obstime+drug+gender+prevOI+AZT+(1|id),data=aids)
summary(aids_mixed_all)
```

I’ll extract the relevant part of this summary:

```
Random effects:
Groups Name Variance Std.Dev.
id (Intercept) 15.253 3.905
Residual 3.842 1.960
Number of obs: 1405, groups: id, 467
```

This variance is pretty large and seems unlikely to be due to chance. Now for the second issue (lots of noise), I don’t know of a formal hypothesis test, but we can look at both the residual standard error and the for the linear regression model with fixed effects only. That said, there are a lot of reasons why will be low or residual standard error will be high, many of which relate to various types of model misspecification rather than noise.

```
aids_lm_all <- lm(formula=CD4~obstime+drug+gender+prevOI+AZT,data=aids)
summary(aids_lm_all)
Call:
lm(formula = CD4 ~ obstime + drug + gender + prevOI + AZT, data = aids)
Residuals:
Min 1Q Median 3Q Max
-10.5414 -3.0200 -0.8382 2.5663 16.0378
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.53580 0.45676 23.066 < 2e-16 ***
obstime -0.08396 0.02525 -3.325 0.000905 ***
drugddI 0.50938 0.23679 2.151 0.031628 *
gendermale -0.68624 0.43070 -1.593 0.111314
prevOIAIDS -4.43438 0.29602 -14.980 < 2e-16 ***
AZTfailure -0.15939 0.30194 -0.528 0.597656
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.433 on 1399 degrees of freedom
Multiple R-squared: 0.2036, Adjusted R-squared: 0.2007
```

We see that the is low. To evaluate the residual standard error, let’s look at the box plot for the CD4 values. Note that there is a problem here: CD4 values are likely not independent from each other, so this box plot will probably underestimate the quantiles distance from the median. However, it’s still somewhat interesting.

```
boxplot(aids[['CD4']],main='CD4 value box plot')
```

We can see that the lower quantile is close to and the upper quantile is close to , so our residual standard error of is quite large, even if the true quantiles are probably more spread out.

In conclusion, in this post, we showed how to do prediction with k-fold cross validation using linear mixed models. We constructed box plots and made confidence intervals for the mean for both within-subjects MAE and within-subjects MSE. We also investigated why the linear mixed model didn’t do much better than the fixed effects only linear model.