In this post we describe centering features in linear regression: you should do it because it changes the interpretation of the intercept in a very helpful way.
In linear regression, one has pairs of feature vectors and responses , and one relates them via the model
(1)
Often one is told to center each feature to be mean . You should do this, as it changes the interpretation of the intercept . The standard interpretation of the intercept is: it’s the expected value of the response, holding all covariates fixed at . However, this isn’t super useful, as the covariate being may rarely happen, or may not have any special interpretation.
However, if we center the mean, then the interpretation changes to: the expected value of the response, holding all covariates fixed to their average values.
An Example: The Boston Housing Dataset
Let’s try this on the Boston housing dataset in R. We first load the data
library(mlbench)
data(BostonHousing)
The regression without mean centering would be as follows:
summary(lm(log(medv) ~ crim + rm + tax + lstat , data = BostonHousing))
Call:
lm(formula = log(medv) ~ crim + rm + tax + lstat, data = BostonHousing)
Residuals:
Min 1Q Median 3Q Max
-0.72730 -0.13031 -0.01628 0.11215 0.92987
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.646e+00 1.256e-01 21.056 < 2e-16 ***
crim -8.432e-03 1.406e-03 -5.998 3.82e-09 ***
rm 1.428e-01 1.738e-02 8.219 1.77e-15 ***
tax -2.562e-04 7.599e-05 -3.372 0.000804 ***
lstat -2.954e-02 1.987e-03 -14.867 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2158 on 501 degrees of freedom
Multiple R-squared: 0.7236, Adjusted R-squared: 0.7214
F-statistic: 327.9 on 4 and 501 DF, p-value: < 2.2e-16
The intercept is 2.65. What does this mean? This is the expected log-price of a house in a neighborhood with no crime, on average no rooms (???), no property tax, and a proportion of lower-status people (not a nice phrasing but I got it from the documentation) of 0. This is clearly not a very realistic scenario, and isn’t very useful for us.
What about if we mean center our covariates? Then we have
summary(lm(log(medv) ~ scale(crim,scale=FALSE) + scale(rm,scale=FALSE) + scale(tax,scale=FALSE) + scale(lstat,scale=FALSE) , data = BostonHousing))
Call:
lm(formula = log(medv) ~ scale(crim, scale = FALSE) + scale(rm,
scale = FALSE) + scale(tax, scale = FALSE) + scale(lstat,
scale = FALSE), data = BostonHousing)
Residuals:
Min 1Q Median 3Q Max
-0.72730 -0.13031 -0.01628 0.11215 0.92987
Coefficients:
Estimate Std. Error t value
(Intercept) 3.035e+00 9.592e-03 316.366
scale(crim, scale = FALSE) -8.432e-03 1.406e-03 -5.998
scale(rm, scale = FALSE) 1.428e-01 1.738e-02 8.219
scale(tax, scale = FALSE) -2.562e-04 7.599e-05 -3.372
scale(lstat, scale = FALSE) -2.954e-02 1.987e-03 -14.867
Pr(>|t|)
(Intercept) < 2e-16 ***
scale(crim, scale = FALSE) 3.82e-09 ***
scale(rm, scale = FALSE) 1.77e-15 ***
scale(tax, scale = FALSE) 0.000804 ***
scale(lstat, scale = FALSE) < 2e-16 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2158 on 501 degrees of freedom
Multiple R-squared: 0.7236, Adjusted R-squared: 0.7214
F-statistic: 327.9 on 4 and 501 DF, p-value: < 2.2e-16
The log-price, fixing crime, average number of rooms, tax rate, and proportion of lower status people to the averages is 3.035. This makes a lot more sense. As you may notice, the coefficients for the covariates don’t change, so the interpretation remains the same.
In conclusion, when you want your intercept to have a nice interpretation, you should center your covariates.