Solving Full Rank Linear Least Squares Without Matrix Inversion in Python and Numpy

In this post we describe how to solve the full rank least squares problem without inverting a matrix, as inverting a matrix is subject to numerical stability issues. We first describe the least squares problem and the normal equations, then describe the naive solution involving matrix inversion and describe its problems. We then describe two other methods: the Cholesky decomposition and the QR decomposition using householder matrices. The first is also unstable, while the second is far more stable. We show examples in python, using numpy and scipy.

The Least Squares Problem

Let A\in \mathbb{R}^{m\times n}, x\in \mathbb{R}^n,b\in \mathbb{R}^m, and let A^TA be full rank. The least squares problem is to solve

(1)   \begin{align*}\min_x\Vert Ax-b\Vert_2\end{align*}

the main application of this is in linear regression, although in that setting it’s generally written as

(2)   \begin{align*}\arg\min_x\Vert X\beta-y\Vert_2\end{align*}

we will use the former notation, which is common in the numerical linear algebra literature.

Normal Equations

In order to solve this, we can expand the terms in the norm and take derivatives, giving us the normal equations:

(3)   \begin{align*}A^T Ax&=A^Tb\end{align*}

the simplest way to solve this is to compute (A^T A)^{-1} and then compute

(4)   \begin{align*}x=(A^T A)^{-1}A^Tb\end{align*}

however let 0\leq \sigma_1\leq \cdots \sigma_n be the singular values of A^TA. If \frac{\sigma_1}{\sigma_n} is close to 0, then the computation of (A^TA)^{-1} will have large errors. Let’s try an extreme case where the matrix A is almost singular, and the number of rows is equal to the number of columns.

import numpy as np
from scipy import linalg

m = 1000
n = 1000
Cov = 0.999999*np.ones((n,n))
A = np.random.multivariate_normal(np.zeros(n),Cov,m)
b = np.random.normal(0,1,m)
gram =,A)


this has extremely high condition number, suggesting that many procedures, in particular matrix inversion, will be numerically unstable. Let’s try solving the least squares problem and computing the residual norm \Vert r\Vert_2=\Vert A^TA\hat{x}-A^Tb\Vert_2 for our estimate \hat{x}.

x_normal_equations =,,b))
r_norm = np.linalg.norm(,x_normal_equations),b))


this is a huge error! Clearly we can’t use this method as an all purpose solution.

Cholesky Decomposition

One alternative to inverting the matrix is to do a Cholesky decomposition AA^T=LL^T, where L is lower triangular. We can then solve

(5)   \begin{align*}LL^Tx&=A^Tb\end{align*}

we can then solve the triangular system Ly=A^Tb for y and L^Tx=y for x. Let’s see how this does.

gram = np.matrix(gram)
L = np.linalg.cholesky(gram)
y = linalg.solve_triangular(L,,lower=True)
x_cholesky = linalg.solve_triangular(L.T,y)
r_norm_cholesky = np.linalg.norm(,x_cholesky),b))


this is the same error. Clearly we can’t use this either.

Householder QR

Householder QR finds a sequence of householder matrices H_1,\cdots,H_k such that H_k\cdots H_1A=R, where R is upper triangular, and H_1,\cdotsH_k=Q, where Q is an orthogonal matrix. Thus A=QR is a QR decomposition. This is more stable than a QR decomposition using classical gram schmidt.

We can plug this into the normal equations via

(6)   \begin{align*}A^TAx&=A^Tb\\R^TQ^TQRx&=R^TQ^Tb\\R^TRx&=R^TQ^Tb\\Rx&=Q^Tb\end{align*}

we can then solve the triangular system Rx&=Q^Tb for x.

r_norm_householder = np.linalg.norm(,x_householder),b))


This is far more stable!


In this post we describe three methods for solving the normal equations for full rank A^TA. The first two are particularly vulnerable to numerical errors, while the third is more robust.

Leave a Reply

Your email address will not be published. Required fields are marked *