Solutions to Stanford’s CS 231n Assignments 1 Inline Problems: KNN

These are solutions to the intuition questions from Stanford’s Convolutional Networks for Visual Recognition (Stanford CS 231n) assignment 1 inline problems for KNN.

Inline Question #1:

Notice the structured patterns in the distance matrix, where some rows or columns are visible brighter. (Note that with the default color scheme black indicates low distances while white indicates high distances.)

What in the data is the cause behind the distinctly bright rows?
What causes the columns?

Your Answer: fill this in.

Either this is an observation from a class not in the training data set, or is at least very different from all/most of the training data, probably in terms of background color.
This training data point doesn’t have any similar points in the test set.

Inline Question 2

We can also other distance metrics such as L1 distance.
The performance of a Nearest Neighbor classifier that uses L1 distance will not change if (Select all that apply.):

The data is preprocessed by subtracting the mean.
The data is preprocessed by subtracting the mean and dividing by the standard deviation.
The coordinate axes for the data are rotated.
None of the above.

Your Answer:
1., 2.

Your explanation:

Consider
(1) $\begin{align*}\Vert (x^{(i)}-\bar{x})-(x^{(j)}-\bar{x})\Vert_1&=\Vert x^{(i)}-x^{(j)}\Vert_1\end{align*}$

Then distances are preserved under subtraction of the mean.
Assume
(2) $\begin{align*}\Vert x^{(i)}-x^{(j)}\Vert_1<\Vert x^{(i)}-x^{(k)}\Vert_1\end{align*}$

then
(3) $\begin{align*}\Vert (x^{(i)}-\bar{x})/\sigma-(x^{(j)}-\bar{x})/\sigma\Vert_1&=\frac{1}{\sigma}\Vert x^{(i)}-x^{(j)}\Vert_1\\&<\frac{1}{\sigma} \Vert x^{(i)}-x^{(k)}\Vert_1\\&=\Vert (x^{(i)}-\bar{x})/\sigma-(x^{(k)}-\bar{x})/\sigma\Vert_1\end{align*}$

and thus ordering of distances is preserved under subtraction of mean and dividing by standard deviation.
Does not hold. Consider three points $x=(0,1),y=(1,0),z=(2,1)$ . Then
(4) $\begin{align*}\Vert y-x\Vert_1&=\Vert y-z\Vert_1\\&=2\end{align*}$

so that $x$ and $z$ both have the same distance from $y$ . Now consider the $45$ degree rotation matrix
(5) $\begin{align*}A&=\left(\begin{array}{cc}\frac{\sqrt{2}}{2}&-\frac{\sqrt{2}}{2}\\\frac{\sqrt{2}}{2}&\frac{\sqrt{2}}{2}\end{array}\right)\end{align*}$

Then
(6) $\begin{align*}x'&=Ax\\&=\left(\begin{array}{c}-\frac{\sqrt{2}}{2}\\\frac{\sqrt{2}}{2}\end{array}\right)\\y'&=Ay\\&=\left(\begin{array}{c}\frac{\sqrt{2}}{2}\\\frac{\sqrt{2}}{2}\end{array}\right)\\z'&=Az\\&=\left(\begin{array}{c}\frac{3\sqrt{2}}{2}\\\sqrt{2}\end{array}\right)\end{align*}$

so that
(7) $\begin{align*}\Vert x'-y'\Vert_1&=2\sqrt{2}\\\Vert y'-z'\Vert_1&=\frac{5}{2}\sqrt{2}\end{align*}$

and thus $\Vert x'-y'\Vert_1<\Vert y'-z'\Vert_1$ . l1 distance ordering is then not preserved under rotation.

Inline Question 3

Which of the following statements about $k$ -Nearest Neighbor ( $k$ -NN) are true in a classification setting, and for all $k$ ? Select all that apply.

The training error of a 1-NN will always be better than that of 5-NN.
The test error of a 1-NN will always be better than that of a 5-NN.
The decision boundary of the k-NN classifier is linear.
The time needed to classify a test example with the k-NN classifier grows with the size of the training set.
None of the above.

Your Answer:
1., 4.

Your explanation:

This is true. If you use the training data set as the test set, then with one nearest neighbor, if given a point x, the nearest neighbor will be the exact same point and thus the error will be $0$ . For 5-NN, $0$ is a lower bound.
False. Consider a 1d example. You have $x_{\textrm{train}}=(-5,-4,-3,-2,-1,3)$ and $y_{\textrm{train}}=(0,0,0,0,0,1)$ . Now consider a new point with $x=2$ and $y=0$ . Then this will have test error $100%$ for 1-nn and $0%$ for $5-nn$ .
No. Consider two classes, one is in the shape of a moon and the other surrounds the moon. Then the decision boundary will have approximately the shape of a moon.
This is true. At test, KNN needs to make a full pass through the entire data set and sort points by distance. The time needed thus grows with the size of the data.

Inline Question #1:

Inline Question 2

Inline Question 3

Related Posts

Leave a Reply Cancel reply