David. C. Lay: Linear Algebra and its Applications", Addison-Wesley.

5.3 Theorem 5

The Diagonalization Theorem: An $n \times n$ matrix $A$ is diagonalizable if and only if $A$ has $n$ linearly independent eigenvectors.

First, prove that

An $n \times n$ matrix $A$ is diagonalizable $⟹$ $A$ has $n$ linearly independent eigenvectors.

A = P D P^{- 1}

D = [\begin{matrix} λ_{1} & 0 & \dots & 0 \\ 0 & λ_{2} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & λ_{n} \end{matrix}]

P = [\begin{matrix} v_{1} & v_{2} & \dots & v_{n} \end{matrix}]

\begin{aligned} P D & = [\begin{array}{c} v_{1} & v_{2} & \dots & v_{n} \end{array}] [\begin{array}{c} λ_{1} & 0 & \dots & 0 \\ 0 & λ_{2} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & λ_{n} \end{array}] \\ = [\begin{array}{c} λ_{1} v_{1} & λ_{2} v_{2} & \dots & λ_{n} v_{n} \end{array}] \end{aligned}

\begin{aligned} A & = A \times I \\ = A (P P^{- 1}) \\ = (A P) P^{- 1} \\ = A P P^{- 1} \end{aligned}

\begin{aligned} A P P^{- 1} & = P D P^{- 1} \\ A P P^{- 1} P & = P D P^{- 1} P \\ A P (P^{- 1} P) & = P D (P^{- 1} P) \\ A P \times I & = P D \times I \\ A P & = P D \end{aligned}

\begin{aligned} A P & = A [\begin{array}{c} v_{1} & v_{2} & \dots & v_{n} \end{array}] \\ = [\begin{array}{c} A v_{1} & A v_{2} & \dots & A v_{n} \end{array}] \end{aligned}

\begin{aligned} [\begin{array}{c} A v_{1} & A v_{2} & \dots & A v_{n} \end{array}] & = [\begin{array}{c} λ_{1} v_{1} & λ_{2} v_{2} & \dots & λ_{n} v_{n} \end{array}] \\ A v_{i} & = λ_{i} v_{i} \end{aligned}

$v_{i}$ is the $i$ th eigenvactor of $A$ and $λ_{i}$ is the corresponding eigenvalue.

Because $P$ is invertible, $[\begin{matrix} v_{1} & v_{2} & \dots & v_{n} \end{matrix}]$ is linearly independent.

Then, prove that

$A$ has $n$ linearly independent eigenvectors $⟹$ An $n \times n$ matrix $A$ is diagonalizable.

Let $A = P B P^{- 1}$ .

$A$ is similar to $B$

Let $B = [\begin{matrix} λ_{1} & 0 & \dots & 0 \\ 0 & λ_{2} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & λ_{n} \end{matrix}]$

Where $λ_{i}$ is the $i$ th eigenvalue of $A$ .

$A v_{i} = λ_{i} v_{i}$ , $v_{i}$ is the corresponding eigenvector.

$P$ is some matirx. But such $P$ may not exit, resulting $A = P B P^{- 1}$ is invalid.

So we just need to find a $P$ .

Let $P = [\begin{matrix} p_{1} & p_{2} & \dots & p_{n} \end{matrix}]$ , where $p_{i}$ is the $i$ the column vector.

\begin{aligned} P B & = [\begin{array}{c} p_{1} & p_{2} & \dots & p_{n} \end{array}] [\begin{array}{c} λ_{1} & 0 & \dots & 0 \\ 0 & λ_{2} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & λ_{n} \end{array}] \\ = [\begin{array}{c} λ_{1} p_{1} & λ_{2} p_{2} & \dots & λ_{n} p_{n} \end{array}] \end{aligned}

Construct the same euqation again.

\begin{aligned} A & = A \times I \\ = A (P P^{- 1}) \\ = (A P) P^{- 1} \\ = A P P^{- 1} \end{aligned}

\begin{aligned} A P P^{- 1} & = P B P^{- 1} \\ A P P^{- 1} P & = P B P^{- 1} P \\ A P (P^{- 1} P) & = P B (P^{- 1} P) \\ A P \times I & = P B \times I \\ A P & = P B \end{aligned}

Let $A = [\begin{matrix} a_{1} \\ a_{2} \\ ⋮ \\ a_{n} \end{matrix}]$ , where $a_{i}$ is the $i$ the raw vector.

\begin{aligned} A P & = [\begin{array}{c} a_{1} \\ a_{2} \\ ⋮ \\ a_{n} \end{array}] [\begin{array}{c} p_{1} & p_{2} & \dots & p_{n} \end{array}] \\ = [\begin{array}{c} a_{1} p_{1} & a_{1} p_{2} & \dots & a_{1} p_{n} \\ a_{2} p_{1} & a_{2} p_{2} & \dots & a_{2} p_{n} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ a_{n} p_{1} & a_{n} p_{2} & \dots & a_{n} p_{n} \end{array}] \end{aligned}

\begin{aligned} A P & = P B \\ [\begin{array}{c} a_{1} p_{1} & a_{1} p_{2} & \dots & a_{1} p_{n} \\ a_{2} p_{1} & a_{2} p_{2} & \dots & a_{2} p_{n} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ a_{n} p_{1} & a_{n} p_{2} & \dots & a_{n} p_{n} \end{array}] & = [\begin{array}{c} λ_{1} p_{1} & λ_{2} p_{2} & \dots & λ_{n} p_{n} \end{array}] \\ [\begin{array}{c} a_{1} p_{i} \\ a_{2} p_{i} \\ ⋮ \\ a_{n} p_{i} \end{array}] & = λ_{i} p_{i} \end{aligned}

Let $a_{i} = [\begin{matrix} a_{i 1} & a_{i 2} & \dots & a_{i n} \end{matrix}]$ .

Let $p_{i} = [\begin{matrix} p_{1 i} \\ p_{2 i} \\ \dots \\ p_{n i} \end{matrix}]$ .

a_{j} p_{i} = [\begin{matrix} a_{j 1} & a_{j 2} & \dots & a_{j n} \end{matrix}] [\begin{matrix} p_{1 i} \\ p_{2 i} \\ \dots \\ p_{n i} \end{matrix}]

By the definition of matrix multiplication,

\begin{aligned} [\begin{array}{c} a_{1} p_{i} \\ a_{2} p_{i} \\ ⋮ \\ a_{n} p_{i} \end{array}] & = [\begin{array}{c} a_{11} & a_{12} & \dots & a_{1 n} \\ a_{21} & a_{22} & \dots & a_{2 n} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ a_{n 1} & a_{n 2} & \dots & a_{n n} \end{array}] [\begin{array}{c} p_{1 i} \\ p_{2 i} \\ \dots \\ p_{n i} \end{array}] \\ = [\begin{array}{c} a_{1} \\ a_{2} \\ ⋮ \\ a_{n} \end{array}] P_{i} \\ = A p_{i} \end{aligned}

A p_{i} = λ_{i} p_{i}

Thus, each $p$ is an eigenvector of $A$ .

Because $n$ eigenvectors of $A$ are linearly independet, $P$ is invertible.

$A = P B P^{- 1}$ is indeed valid.

6.1 Theorem 3

Let $A$ be an $m \times n$ matrix. The orthogonal complement of the row space of $A$ is the null space of $A$ , and the orthogonal complement of the column space of $A$ is the null space of $A^{T}$ : $(Row A)^{⊥} = Nul A$ and $(Col A)^{⊥} = Nul A^{T}$

Let $A = [\begin{matrix} a_{1} \\ a_{2} \\ ⋮ \\ a_{n} \end{matrix}]$ , and $x \in Nul A$ .

By the definition of Null Space,

\begin{aligned} A x & = 0 \\ [\begin{array}{c} a_{1} \\ a_{2} \\ ⋮ \\ a_{n} \end{array}] x & = [\begin{array}{c} 0 \\ 0 \\ ⋮ \\ 0 \end{array}] \end{aligned}

a_{i} x = 0

Because $a_{i}$ is the row vector and $x$ is the column vector, $a_{i} x$ is the the inner product of the row vector and any vector from Null Space, resulting the linear combinations of row vector are also perpendicular to Null Space.

Thus, $Row A = (Nul A)^{⊥}$ , namely $(Row A)^{⊥} = Nul A$ .

And we can get $(Col A)^{⊥} = Nul A^{T}$ , by transposing both sides.

6.2 Theorem 7

Let $U$ be an $m \times n$ matrix with orthonormal columns, and let $x$ and $y$ be in $R^{n}$ . Then

a. $‖ U x ‖ = ‖ x ‖$

\begin{aligned} ‖ U x ‖ & = \sqrt{U x \cdot U x} \\ = \sqrt{(U x)^{T} U x} \\ = \sqrt{x^{T} U^{T} U x} \\ = \sqrt{x^{T} (U^{T} U) x} \\ = \sqrt{x^{T} I x} \\ = \sqrt{x^{T} x} \end{aligned}

\begin{aligned} ‖ x ‖ & = \sqrt{x \cdot x} \\ = \sqrt{x^{T} x} \end{aligned}

b. $(U x) \cdot (U y) = x \cdot y$

\begin{aligned} (U x) \cdot (U y) & = (U x)^{T} (U y) \\ = x^{T} U^{T} U y \\ = x^{T} (U^{T} U) y \\ = x^{T} I y \\ = x^{T} y \end{aligned}

\begin{aligned} x \cdot y & = x^{T} y \end{aligned}

c. $(U x) \cdot (U y) = 0 ⟺ x \cdot y = 0$

This is just a special case of b.

6.3 Theorem 8

The Orthogonal Decomposition Theorem: Let $W$ be a subspace of $R^{n}$ . Then each $y$ in $R^{n}$ can be written uniquely in the form $y = \hat{y} + z$ (1) where $\hat{y}$ is in $W$ and $z$ is in $W^{⊥}$ . In fact, if ${u_{1}, \dots, u_{p}}$ is any orthogonal basis of $W$ , then $\hat{y} = \frac{y \cdot u_{1}}{u_{1} \cdot u_{1}} u_{1} + \dots + \frac{y \cdot u_{p}}{u_{p} \cdot u_{p}} u_{p}$ (2) and $z = y - \hat{y}$ .

The textbook provides the proof. Therefore, we only show how to derive formula (2) here.

\begin{aligned} (y - \hat{y}) \cdot \hat{y} & = 0 \\ y \cdot \hat{y} - \hat{y} \cdot \hat{y} & = 0 \\ y \cdot \hat{y} & = \hat{y} \cdot \hat{y} \end{aligned}

Let $\hat{y} = c_{1} u_{1} + \dots + c_{p} u_{p}$ .

Namely,

\begin{aligned} \hat{y} & = [\begin{array}{c} u_{1} & u_{2} & \dots & u_{p} \end{array}] [\begin{array}{c} c_{1} \\ c_{2} \\ ⋮ \\ c_{p} \end{array}] \end{aligned}

\begin{aligned} y \cdot \hat{y} = & y^{T} \hat{y} \\ = & y^{T} [\begin{array}{c} u_{1} & u_{2} & \dots & u_{p} \end{array}] [\begin{array}{c} c_{1} \\ c_{2} \\ ⋮ \\ c_{p} \end{array}] \end{aligned}

\begin{aligned} \hat{y} \cdot \hat{y} = & (c_{1} u_{1} + \dots + c_{p} u_{p}) \cdot (c_{1} u_{1} + \dots + c_{p} u_{p}) \\ = & (c_{1})^{2} u_{1} \cdot u_{1} + \dots + (c_{p})^{2} u_{p} \cdot u_{p} \\ = & [\begin{array}{c} c_{1} (u_{1} \cdot u_{1}) & \dots & c_{p} (u_{p} \cdot u_{p}) \end{array}] [\begin{array}{c} c_{1} \\ c_{2} \\ ⋮ \\ c_{p} \end{array}] \end{aligned}

Apply $y \cdot \hat{y} = \hat{y} \cdot \hat{y}$ :

\begin{aligned} y^{T} [\begin{array}{c} u_{1} & u_{2} & \dots & u_{p} \end{array}] [\begin{array}{c} c_{1} \\ c_{2} \\ ⋮ \\ c_{p} \end{array}] \\ = & [\begin{array}{c} c_{1} (u_{1} \cdot u_{1}) & \dots & c_{p} (u_{p} \cdot u_{p}) \end{array}] [\begin{array}{c} c_{1} \\ c_{2} \\ ⋮ \\ c_{p} \end{array}] \\ y^{T} [\begin{array}{c} u_{1} & u_{2} & \dots & u_{p} \end{array}] \\ = & [\begin{array}{c} c_{1} (u_{1} \cdot u_{1}) & \dots & c_{p} (u_{p} \cdot u_{p}) \end{array}] \end{aligned}

By the definition of matrix multiplication,

\begin{aligned} c_{i} (u_{i} \cdot u_{i}) & = y^{T} u_{i} \\ = y \cdot u_{i} \\ c_{i} & = \frac{y \cdot u_{i}}{u_{i} \cdot u_{i}} \end{aligned}

$i$ ranges from $1$ to $p$ .

6.5 Theorem 15

Given an $m \times n$ matrix $A$ with linearly independent columns, let $A = Q R$ be a QR factorization of A as in Theorem 12. Then, for each $b$ in $R^{m}$ , the equation $A x = b$ has a unique least-squares solution, given by $\hat{x} = R^{- 1} Q^{T} \hat{b}$

The textbook provides the proof. We only show how to derive formula (6) here.

\begin{aligned} A \hat{x} = & \hat{b} \\ (Q R) \hat{x} = & \hat{b} \\ (Q R)^{- 1} (Q R) \hat{x} = & (Q R)^{- 1} \hat{b} \\ I \hat{x} = & R^{- 1} Q^{- 1} \hat{b} \\ \hat{x} = & R^{- 1} Q^{T} \hat{b} \end{aligned}

Because $Q$ has orthonormal columns,

\begin{array}{r} Q^{T} Q = I = Q^{- 1} Q \\ Q^{T} = Q^{- 1} \end{array}

David. C. Lay: Linear Algebra and its Applications", Addison-Wesley. ​

5.3 Theorem 5 ​

6.1 Theorem 3 ​

6.2 Theorem 7 ​

a. ‖Ux‖=‖x‖ ​

b. (Ux)⋅(Uy)=x⋅y ​

c. (Ux)⋅(Uy)=0⟺x⋅y=0 ​

6.3 Theorem 8 ​

6.5 Theorem 15 ​