# Linear Regression Plots: Fitted vs Residuals

In this post we describe the fitted vs residuals plot, which allows us to detect several types of violations in the linear regression assumptions. You may also be interested in qq plots, scale location plots, or the residuals vs leverage plot.

Here, one plots on the x-axis, and on the y-axis. In this post we’ll describe what we can learn from a residuals vs fitted plot, and then make the plot for several R datasets and analyze them. The fitted vs residuals plot is mainly useful for investigating:

1. Whether linearity holds. This is indicated by the mean residual value for every fitted value region being close to . In R this is indicated by the red line being close to the dashed line.
2. Whether homoskedasticity holds. The spread of residuals should be approximately the same across the x-axis.
3. Whether there are outliers. This is indicated by some ‘extreme’ residuals that are far from the rest.

To illustrate how violations of linearity (1) affect this plot, we create an extreme synthetic example in R.

   x=1:20   y=x^2   plot(lm(y~x))

So a quadratic relationship between and leads to an approximately quadratic relationship between fitted values and residuals. Why is this? Firstly, the fitted model is Which gives us that . We then have

(1) which is itself a 2nd order polynomial function of . More generally, if the relationship between and is non-linear, the residuals will be a non-linear function of the fitted values. This idea generalizes to higher dimensions (function of covariates instead of single ).

## The Cars Dataset

We now look at the same on the cars dataset from R. We regress distance on speed.

plot(lm(dist~speed,data=cars))
library(mlbench)data(BostonHousing)plot(lm(medv ~ crim + rm + tax + lstat, data = BostonHousing))