# Jordan normal form

Jordan normal form states that every square matrix is similar to a Jordan normal form matrix, one of the form

$J = \begin{bmatrix}J_1 & \; & \; \\ \; & \ddots & \; \\\; & \; & J_p \\ \end{bmatrix}$

where each of the $J_i$ is square of the form

$J_i = \begin{bmatrix}\lambda_i & 1 & \; & \; \\ \; & \lambda_i \; & \ddots & \; \\ \; & \; & \ddots & 1 \\ \; & \; & \; & \lambda_i \\ \end{bmatrix}$.

This is constructed via generalized eigenvectors. One can observe that each block matrix corresponds to an invariant subspace, and generalized eigenvectors (of a matrix) are a set of chains, each of which is its own invariant subspace.

We let $A$ be any linear transformation from $V$ to $V$, where $V$ is of course a vector space.

It is more common knowledge that $Ker(A - \lambda I)$ is the mechanism used to solve for eigenvectors. Let us first observe that $v \in Ker(A - \lambda I)$ is such that also $Av \in Ker(A - \lambda I)$ since $A$ commutes with $A - \lambda I$ and that this extends to $Ker(A - \lambda I)^t$ for any natural number $t$. This gives us a way to identify larger invariant subspaces.

Let $W_i = Ker(A - \lambda I)^i$. $W_i \subset W_{i+1}$ is obvious, and there will be some and a smallest $t$ at which $W_t = W_{t+1}$. Afterwards, the $W_i$ must all be equal. If not, there will be in the intermediary on iterating $A - \lambda I$ against a vector from which $W_t = W_{t+1}$ is contradicted.

Next, we observe that $Ker(A - \lambda I)^t \cap Im(A - \lambda I)^t = \emptyset$. Suppose not. Then, we would have some $w \in W_{2t}$ but not in $W_t$.

Rank nullity theorem says that the remainder after pulling out $Ker(A - \lambda I)^t$ for some eigenvalue $\lambda$ is $Im(A - \lambda I)^t$. We can run the same algorithm on that for another eigenvalue. So this is resolved by induction.

The result is that if $A$ has distinct eigenvalues $\lambda_1, \lambda_2, \ldots, \lambda_k$, there are $a_1, a_2, \ldots, a_k$ such that the domain of $A$ is

$(A - \lambda_1 I)^{a_1} \oplus (A - \lambda_2 I)^{a_2} \oplus \cdots \oplus (A - \lambda_k I)^{a_k}$.

Now does $Ker(A - \lambda I)^t$ correspond to a irreducible invariant subspace necessarily? No, as there is a difference between algebraic and geometric multiplicity.

Now we will show, as the second part of the proof, to be invoked on the components in the direct sum decomposition from the preceding first part of the proof that if $T$ is nilpotent, meaning that $T^n = 0$ for some $n$, then there are $v_1, \ldots, v_k$ and $a_1, \ldots, a_k$ such that $\{v_1, Tv_1, \ldots, T^{a_1-1}v_1, v_2, \ldots, v_k, Tv_k, \ldots, T^{a_k-1}v_k\}$ is a basis (linearly independent by definition) for the domain of $T$, with $\sum a_k = \dim V$ and $\max(a_1, \ldots, a_k) = n$. (Note that here $n$ is the smallest with $T^n = 0$.)

For any eigenvalue, there is an eigenvector space associated with it. Take its preimage with respect to $A$. Do this successively until nothing remains, which will be at the $n-1$th iteration. In particular, take $u_1, \ldots, u_k$ to be the basis of the eigenvector space. For each one of these that has non-empty preimage, we take the element with the kernel projected out. This accumulates a set of vectors of the format specified. It has to be exhaustive with respect to the vector space under the nilpotence assumption, from which termination is also guaranteed. It remains to show that these are linearly independent. We can using our eigenvector space as our base case take an inductive hypothesis where the vectors accumulated prior to the $k$th iteration are linearly independent. Now we show that the vector set remains so after adding in the ones obtained from taking preimage. We note that first, the added ones are linearly independent themselves (if a nontrivial linear combinations gives zero, applying $A$ would violate our inductive hypothesis). There is also that a nontrivial linear combination of the newly added ones cannot equal a linear combination of the rest. To show this, assume otherwise, and apply $A$ just enough times ($k$ times) for one side to disappear. The other side should be a linear combination with respect to our designated basis of the eigenvector space, which cannot disappear. This concludes our construction.

Essentially we have chains (of vectors) which terminate when an element is no longer found after our preimage operation. Applying to this $T = A - \lambda I$, we see that for some element $u$ in our chain, $Au = \lambda u + v$, where $v$ is the previous element of the chain, with $0$ signifying that we are the at an eigenvector (non-generalized), at the front. Ones along the superdiagonal correspond to the $1$ coefficient of $v$ above.