18.S096 – Problem Set 1

18.S096 – Problem Set 1



Auxiliary lemma

We start by proving that, if T is an invertible linear transformation defined over a vector space \mathbb{V} and \mathbb{S} is a subspace of \mathbb{V}, \mathrm{dim}\,T(\mathbb{S}) = \mathrm{dim}\,\mathbb{S}.

Lemma proof

Let’s \{v_1, v_2, \hdots, v_p\} be a basis of \mathbb{S}. If \{T(v_1), T(v_2), \hdots, T(v_p)\} is a LI set then we have proved that \mathrm{dim}\,T(\mathbb{S}) = \mathrm{dim}\,\mathbb{S}. So let’s assume we have \alpha_i such that \sum_{i=1}^p \alpha_i\,T(v_i) = 0. By linearity we will have T\left(\sum_{i=1}^p \alpha_i\,v_i\right) = 0. If we apply the inverse linear transformation T^{-1}, we get \sum_{i=1}^p \alpha_i\,v_i = T^{-1}(0). As the LHS is non-zero because \{v_1, v_2, \hdots, v_p\} is a basis of \mathbb{S}, we have a contradiction.


Main proof

We know that \{V e_1, V e_2, \hdots, V e_n\}, where \{e_1, e_2, \hdots, e_n\} is the canonical basic of \mathbb{R}^n, is also a basis of \mathbb{R}^n because of the previous lemma. If we assume that A is a m \times n matrix, let’s analyze its effect over every element of that basis:

\displaystyle A(V e_i) = \left(U \Sigma V^{T}\right) (V e_i) = U \Sigma \left(V^{T} V\right) e_i = U \left(\Sigma e_i\right).

As \{\Sigma e_1, \hdots, \Sigma e_n\} spans the range of \Sigma, if only the first p diagonal elements of \Sigma are nonzero then \{\Sigma e_1, \hdots, \Sigma e_p\} should be a basis of the range of \Sigma (they are linearly independent and they span the range). By the previously proved lemma, U won’t change that dimension, so we know that \{U \Sigma e_1, \hdots, U \Sigma e_p\} is a basis of the range of A. That proves the desired result.



That follows from the singular values being the square roots of the eigenvalues of A^T A and the uniqueness of the eigenvalues.


Let’s assume U = V. Then we have

\displaystyle A^T = \left(U \Sigma U^T\right)^T = \left(U^T\right)^T \Sigma^T U^T = U \Sigma U^T = A.

(\Sigma^T = \Sigma because it’s a square diagonal matrix.)

So we can only have U = V if A is symmetric.


This item asks for the other direction of the proof. Let’s show that symmetric matrices can be diagonalized by an orthogonal matrix inductively.

The one by one case is trivial, so let’s assume that all symmetric matrices of size n \times n are diagonalizable by an orthogonal matrix and try to diagonalize a (n + 1) \times (n + 1) one, A.

We know (by solving the characteristic polynomial) that A has at least one complex eigenvalue \lambda and unit eigenvector v. We can start by proving it’s real:

\displaystyle v^\dagger (A v) = v^\dagger (\lambda v)

\displaystyle v^\dagger A v = \lambda v^\dagger v

\displaystyle \left(v^\dagger A v\right)^\dagger = \left(\lambda v^\dagger v\right)^\dagger

\displaystyle  v^\dagger A^\dagger \left(v^\dagger\right)^\dagger = \bar{\lambda} v^\dagger \left(v\right)^\dagger

\displaystyle  v^\dagger A v = \bar{\lambda} v^\dagger v.

By using the previous equation and the fact that v^\dagger v is positive:

\displaystyle \lambda v^\dagger v = \bar{\lambda} v^\dagger v

\displaystyle \lambda = \bar{\lambda}.

Now we want to prove the image of the orthogonal complement of the space generated by v is orthogonal to v. So let’s assume we have a vector u such that u^T v = 0. Then we have

\displaystyle \left(A u\right)^T v = u^T A^T v = u^T (A v) = u^T (\lambda v) = \lambda u^T v = 0.

If we take an arbitrary orthonormal basis of v^\perp, \{u_1, \hdots, u_n\}, we can extend it by adding v to get a basis of \mathbb{R}^{n+1}. Expressing A in that basis:

\displaystyle A = \left(\begin{array}{c|c|c|c} u_1 & \hdots & u_n & v \end{array}\right) \left(\begin{array}{cccc}b_{1 1} & \hdots & b_{1 n} & 0 \\ \vdots & \ddots & \vdots & \vdots \\ b_{n 1} & \hdots & b_{n n} & 0 \\ 0 & \hdots & 0 & \lambda \end{array} \right) \left(\begin{array}{c} u_1 \\ \hline \vdots \\ \hline u_n \\ \hline v \end{array}\right),

where the zeroes are due to the previously proved orthogonality and the fact that v is an eigenvector.

We know that B must be symmetric, because if A = U A' U^T, we have U^T A U = U^T U A' U^T U = A' and (A')^T = \left(U^T A U\right)^T = U^T A U = A'.

Using the inductive assumption, B is diagonalizable and we have B = W D W^T, where W is an orthogonal n \times n matrix. W can be extended to an ortogonal (n + 1) \times (n + 1) matrix by adding a 1 in the diagonal and padding with zeroes.

\displaystyle \left(\begin{array}{c|c} B & 0 \\ \hline 0 & \lambda \end{array}\right) = \left(\begin{array}{c|c} W & 0 \\ \hline 0 & 1 \end{array}\right) \left(\begin{array}{c|c} D & 0 \\ \hline 0 & \lambda \end{array}\right) \left(\begin{array}{c|c} W^T & 0 \\ \hline 0 & 1 \end{array}\right).

By inserting that in the previous expression of A and using the fact that the product of orthogonal matrices is orthogonal, we finish the diagonalization.


Let A = U \Sigma V^T be a SVD of A with the singular values in order of decreasing magnitude. Then we have

\displaystyle A^T A = \left(U \Sigma V^T\right)^T U \Sigma V^T = V \Sigma^T U^T U \Sigma V^T = V (\Sigma^T \Sigma) V^T


\displaystyle A A^T = U \Sigma V^T \left(U \Sigma V^T\right)^T = U \Sigma V^T V \Sigma^T U^T  = U (\Sigma \Sigma^T) U^T.

As only the diagonal entries of \Sigma are nonzero, the eigenvalues are going to be identical in both cases and equal to the squares of the singular values.


We previously proved that A is going to be diagonalizable by an orthogonal matrix. Let’s assume the entry D_{i i} is negative and that u_i is the i-th row of U. Then we have

u_i^T A u_i = u_i^T U D U^T u_i = e_i^T D e_i = D_{i i} < 0,

in contradiction with the fact that A is positive semi-definite.



If A is a diagonal matrix, only A_{i i} can be different from zero. Then we have

\displaystyle (A x)_i = \sum_{j = 0}^n A_{i j} x_j = \left\{ \begin{array}{ll} A_{i i} x_i & \mbox{if } i \le  n \\ 0 & \mbox{otherwise} \end{array} \right..

So if we want to choose the values of x_i to make A x as close as possible to w, we just have to do:

\displaystyle x_i = \left\{ \begin{array}{ll} A_{i i}^{-1} w_i & \mbox{if } A_{i i} \ne 0 \\ \mbox{arbitrary} & \mbox{otherwise} \end{array} \right.


We will prove by contradiction that x must be V y, where y is the solution of the “diagonal A” problem with A = \Sigma and w = U^T w. So let’s assume ||A x - w|| < ||A (V y) - w||. By expanding A we get

\displaystyle || U \Sigma V^T x - w || < || U \Sigma V^T V y - w ||

\displaystyle || U \Sigma V^T x - w || < || U \Sigma y - w ||.

As orthogonal transformations preserve norms,

\displaystyle \left\Vert U^T(U \Sigma V^T x - w) \right\Vert < \left\Vert U^T (U \Sigma y - w) \right\Vert

\displaystyle \left\Vert \Sigma (V^T x) - U^T w \right\Vert < \left\Vert \Sigma y - U^T w \right\Vert.

But if this is true, y wouldn’t be a solution to the original problem so we have a contradiction.


We just have to do some name changes to adapt the problem:

\displaystyle \vec{x} = \left(\begin{array}{c} a \\ b \end{array}\right)

\displaystyle \vec{w} = \vec{y}

\displaystyle \vec{v_1} = \vec{x}

\displaystyle \vec{v_2} = \left(\begin{array}{c} 1 \\ 1 \\ \vdots \\ 1 \end{array}\right)

As minimizing L is the same as minimizing L^2, expressing the norm explicitly we get

\displaystyle L^2 = \sum_{i = 1}^n (a \cdot x_i + b \cdot 1 - y_i)^2,

exactly the objective function that was asked for.


It’s just a question of using another vector:

\displaystyle \vec{x} = \left(\begin{array}{c} a \\ b \\ c \end{array}\right)

\displaystyle \vec{w} = \vec{y}

\displaystyle \vec{v_1} = \left(\begin{array}{c} x_1^2 \\ x_2^2 \\ \vdots \\ x_n^2 \end{array}\right)

\displaystyle \vec{v_2} = \vec{x}

\displaystyle \vec{v_3} = \left(\begin{array}{c} 1 \\ 1 \\ \vdots \\ 1 \end{array}\right)