## Cayley-Hamilton theorem and Nakayama’s lemma

The Cayley-Hamilton theorem states that every square matrix over a commutative ring $A$ satisfies its own characteristic equation. That is, with $I_n$ the $n \times n$ identity matrix, the characteristic polynomial of $A$

$p(\lambda) = \det (\lambda I_n - A)$

is such that $p(A) = 0$. I recalled that in a post a while ago, I mentioned that for any matrix $A$, $A(\mathrm{adj}(A)) = (\det A) I_n$, a fact that is not hard to visualize based on calculation of determinants via minors, which is in fact much of what brings the existence of this adjugate to reason in some sense. This can be used to prove the Cayley-Hamilton theorem.

So we have

$(\lambda I_n - A)\mathrm{adj}(\lambda I_n - A) = p(\lambda)I_n$,

where $p$ is the characteristic polynomial of $A$. The adjugate in the above is a matrix of polynomials in $t$ with coefficients that are matrices which are polynomials in $A$, which we can represent in the form $\displaystyle\sum_{i=0}^{n-1}t^i B_i$.

We have

\displaystyle {\begin{aligned}p(\lambda)I_{n} &= (\lambda I_n - A)\displaystyle\sum_{i=0}^{n-1}\lambda^i B_i \\ &= \displaystyle\sum_{i=0}^{n-1}\lambda^{i+1}B_{i}-\sum _{i=0}^{n-1}\lambda^{i}AB_{i} \\ &= \lambda^{n}B_{n-1}+\sum _{i=1}^{n-1}\lambda^{i}(B_{i-1}-AB_{i})-AB_{0}.\end{aligned}}

Equating coefficients gives us

$B_{n-1} = I_n, \qquad B_{i-1} - AB_i = c_i I_n, 1 \leq i \leq n-1, \qquad -AB_0 = c_0I_0$.

With this, we have

$A^n + c_{n-1}A^{n-1} + \cdots + c_1A + c_0I_n = A^nB_{n-1} + \displaystyle\sum_{i=1}^{n-1} (A^iB_{i-1} - A^{i+1}B_i) - AB_0 = 0$,

with the RHS telescoping and annihilating itself to $0$.

There is generalized version of this for a module over a ring, which goes as follows.

Cayley-Hamilton theorem (for modules) Let $A$ be a commutative ring with unity, $M$ a finitely generated $A$-module, $I$ an ideal of $A$, $\phi$ an endomorphism of $M$ with $\phi M \subset IM$.

Proof: It’s mostly the same. Let $\{m_i\} \subset M$ be a generating set. Then for every $i$, $\phi(m_i) \in IM$, with $\phi(m_i) = \displaystyle\sum_{j=1}^n a_{ij}m_j$, with the $a_{ij}$s in $I$. This means by closure properties of ideals the polynomial coefficients in the above will stay in $I$.     ▢

From this follows easily a statement of Nakayama’s lemma, ubiquitous in commutative algebra.

Nakayama’s lemma  Let $I$ be an ideal in $R$, and $M$ a finitely-generated module over $R$. If $IM = M$, then there exists an $r \in R$ with $r \equiv 1 \pmod{I}$, such that $rM = 0$.

Proof: With reference to the Cayley-Hamilton theorem, take $\phi = I_M$, the identity map on $M$, and define the polynomial $p$ as above. Then

$rI_M = p(I_M) = (1 + c_{n-1} + c_{n-2} + \cdots + c_0)I_M = 0$

both annihilates the $c_i$s, coefficients residing in $I$, so that $r \equiv 1 \pmod{I}$ and gives the zero map on $M$ in order for $rM = 0$.     ▢

## Jordan normal form

Jordan normal form states that every square matrix is similar to a Jordan normal form matrix, one of the form

$J = \begin{bmatrix}J_1 & \; & \; \\ \; & \ddots & \; \\\; & \; & J_p \\ \end{bmatrix}$

where each of the $J_i$ is square of the form

$J_i = \begin{bmatrix}\lambda_i & 1 & \; & \; \\ \; & \lambda_i \; & \ddots & \; \\ \; & \; & \ddots & 1 \\ \; & \; & \; & \lambda_i \\ \end{bmatrix}$.

This is constructed via generalized eigenvectors. One can observe that each block matrix corresponds to an invariant subspace, and generalized eigenvectors (of a matrix) are a set of chains, each of which is its own invariant subspace.

We let $A$ be any linear transformation from $V$ to $V$, where $V$ is of course a vector space.

It is more common knowledge that $Ker(A - \lambda I)$ is the mechanism used to solve for eigenvectors. Let us first observe that $v \in Ker(A - \lambda I)$ is such that also $Av \in Ker(A - \lambda I)$ since $A$ commutes with $A - \lambda I$ and that this extends to $Ker(A - \lambda I)^t$ for any natural number $t$. This gives us a way to identify larger invariant subspaces.

Let $W_i = Ker(A - \lambda I)^i$. $W_i \subset W_{i+1}$ is obvious, and there will be some and a smallest $t$ at which $W_t = W_{t+1}$. Afterwards, the $W_i$ must all be equal. If not, there will be in the intermediary on iterating $A - \lambda I$ against a vector from which $W_t = W_{t+1}$ is contradicted.

Next, we observe that $Ker(A - \lambda I)^t \cap Im(A - \lambda I)^t = \emptyset$. Suppose not. Then, we would have some $w \in W_{2t}$ but not in $W_t$.

Rank nullity theorem says that the remainder after pulling out $Ker(A - \lambda I)^t$ for some eigenvalue $\lambda$ is $Im(A - \lambda I)^t$. We can run the same algorithm on that for another eigenvalue. So this is resolved by induction.

The result is that if $A$ has distinct eigenvalues $\lambda_1, \lambda_2, \ldots, \lambda_k$, there are $a_1, a_2, \ldots, a_k$ such that the domain of $A$ is

$(A - \lambda_1 I)^{a_1} \oplus (A - \lambda_2 I)^{a_2} \oplus \cdots \oplus (A - \lambda_k I)^{a_k}$.

Now does $Ker(A - \lambda I)^t$ correspond to a irreducible invariant subspace necessarily? No, as there is a difference between algebraic and geometric multiplicity.

Now we will show, as the second part of the proof, to be invoked on the components in the direct sum decomposition from the preceding first part of the proof that if $T$ is nilpotent, meaning that $T^n = 0$ for some $n$, then there are $v_1, \ldots, v_k$ and $a_1, \ldots, a_k$ such that $\{v_1, Tv_1, \ldots, T^{a_1-1}v_1, v_2, \ldots, v_k, Tv_k, \ldots, T^{a_k-1}v_k\}$ is a basis (linearly independent by definition) for the domain of $T$, with $\sum a_k = \dim V$ and $\max(a_1, \ldots, a_k) = n$. (Note that here $n$ is the smallest with $T^n = 0$.)

For any eigenvalue, there is an eigenvector space associated with it. Take its preimage with respect to $A$. Do this successively until nothing remains, which will be at the $n-1$th iteration. In particular, take $u_1, \ldots, u_k$ to be the basis of the eigenvector space. For each one of these that has non-empty preimage, we take the element with the kernel projected out. This accumulates a set of vectors of the format specified. It has to be exhaustive with respect to the vector space under the nilpotence assumption, from which termination is also guaranteed. It remains to show that these are linearly independent. We can using our eigenvector space as our base case take an inductive hypothesis where the vectors accumulated prior to the $k$th iteration are linearly independent. Now we show that the vector set remains so after adding in the ones obtained from taking preimage. We note that first, the added ones are linearly independent themselves (if a nontrivial linear combinations gives zero, applying $A$ would violate our inductive hypothesis). There is also that a nontrivial linear combination of the newly added ones cannot equal a linear combination of the rest. To show this, assume otherwise, and apply $A$ just enough times ($k$ times) for one side to disappear. The other side should be a linear combination with respect to our designated basis of the eigenvector space, which cannot disappear. This concludes our construction.

Essentially we have chains (of vectors) which terminate when an element is no longer found after our preimage operation. Applying to this $T = A - \lambda I$, we see that for some element $u$ in our chain, $Au = \lambda u + v$, where $v$ is the previous element of the chain, with $0$ signifying that we are the at an eigenvector (non-generalized), at the front. Ones along the superdiagonal correspond to the $1$ coefficient of $v$ above.

## Hilbert basis theorem

I remember learning this theorem early 2015, but I could not remember its proof at all. Today, I relearned it. It employed a beautiful induction argument to transfer the Noetherianness (in the form of finite generation) from $R$ to $R[x]$ via the leading coefficient.

Hilbert Basis TheoremIf $R$ is a Noetherian ring, then so is $R[x]$.

Proof: Take some ideal $J$ in $R$. Notice that if we partition $J$ by degree, we get from the leading coefficients appearing in each an ascending chain (that has to become constant eventually, say at $k$). Take finite sets $A_n \subset J$ for $m \leq n \leq k$, where $m$ is the smallest possible non-zero degree such that the $I_n$s for the leading coefficient ideals are generated. With this we can for any polynomial $p$ construct a finite combination within $A = \displaystyle\cup_{n=m}^k A_n$ that equates to $p$ leading coefficient wise, and thereby subtraction reduces to a lower degree. Such naturally lends itself induction, with $m$ as the base case. For $m$ any lower degree polynomial is the zero polynomial. Now assume, as the inductive hypothesis that $A$ acts as a finite generating set all polynomials with degree at most $n$. If $n+1 \leq k$, we can cancel out the leading coefficient using our generating set, and then use the inductive hypothesis. If $n+1 > k$, we can by our inductive hypothesis generate with $A$ a degree $n$ polynomial with same leading coefficient (and thereby a degree $n+1$ one multiplying by $x$) and from that apply our inductive hypothesis again, this time on our difference.

## Galois theory

I’ve been quite exhausted lately with work and other annoying life things. So sadly, I haven’t been able to think about math much, let alone write about it. However, this morning on the public transit I was able to indulge a bit by reviewing in my head some essentials behind Galois theory, in particular how its fundamental theorem is proved.

The first part of it states that there is the inclusion reversing relation between the fixed fields and subgroups of the full Galois group and moreover, the degree of the field extension is equal to the index of corresponding subgroup. This equivalence can be easily proved using the primitive element theorem, which I will state and prove.

Primitive element theorem: Let $F$ be a field. $F(\alpha)(\beta)$, the field from adjoining elements $\alpha, \beta$ to $F$ can be represented as $F(\gamma)$ for some single element $\gamma$. This extends inductively to that any field extension can be represented by some adjoining some primitive element.

Proof: Let $\gamma = \alpha + c\beta$ for some $c \in F$. We will show that there is such a $c$ such that $\beta$ is contained in $F(\gamma)$. Let $f, g$ be minimal polynomials for $\alpha$ and $\beta$ respectively. Let $h(x) = f(\gamma - cx)$. The minimal polynomial of $\beta$ in $F(\gamma)$ must divide both $h$ and $g$. Suppose it has degree at least $2$. Then there is some $\beta' \neq \beta$ which induces $\alpha' = \gamma - c\beta'$ that is another root of $f$. With $\gamma = \alpha + c\beta = \alpha' + c\beta'$, there is only a finite number of $c$ such that $\beta$ is not in $F(\gamma)$. QED.

The degree of a field extension corresponds to the degree of the minimal polynomial of its primitive element. That primitive element can be in an automorphism mapped to any one of the roots of the minimal polynomial, thereby determining the same number of cosets.

The second major part of this fundamental theorem states that normality subgroup wise is equivalent to there being a normal extension field wise. To see this, remember that if a field extension is normal, a map that preserves multiplication and addition cannot take an element in the extended field outside it as that would imply that its minimal polynomial has a root outside the extended field, thereby violating normality. Any $g$ in the full Galois group thus in a normal extension escapes not the extended field (which is fixed by the subgroup $H$ we’re trying to prove is normal). Thus for all $h \in H$, $g^{-1}hg$ also fixes the extended field, meaning it’s in $H$.

## A observation on conjugate subgroups

Let $H$ and $H'$ be conjugate subgroups of $G$, that is, for some $g \in G$, $g^{-1}Hg = H'$. Equivalently, $HgH' = gH'$, which means there is some element of $G/H'$ such that under the action of $H$ on $G/H'$, its stabilizer subgroup is $H$, all of the group of the group action. Suppose $H$ is a $p$-group with index with respect to $G$ non-divisible by $p$. Then such a fully stabilized coset must exist by the following lemma.

If $H$ is a $p$-group that acts on $\Omega$, then $|\Omega| = |\Omega_0|\;(\mathrm{mod\;} p)$, where $\Omega_0$ is the subset of $\Omega$ of elements fully stabilized by $H$.

Its proof rests on the use orbit stabilizer theorem to vanish out orbits that are multiples of $p$.

This is the natural origin of the second Sylow theorem.

## Math sunday

I had a chill day thinking about math today without any pressure whatsoever. First I figured out, calculating inductively, that the order of $GL_n(\mathbb{F}_p)$ is $(p^n - 1)(p^n - p)(p^n - p^2)\cdots (p^n - p^{n-1})$. You calculate the number of $k$-tuples of column vectors linear independent and from there derive $p^k$ as the number of vectors that cannot be appended if linear independence is to be preserved. A Sylow $p$-group of that is the group of upper triangular matrices with ones on the diagonal, which has the order $p^{n(n-1)/2}$ that we want.

I also find the proof of the first Sylow theorem much easier to understand now, the inspiration of it. I had always remembered that the Sylow $p$-group we are looking for can be the stabilizer subgroup of some set of $p^k$ elements of the group where $p^k$ divides the order of the group. By the pigeonhole principle, there can be no more than $p^k$ elements in it. The part to prove that kept boggling my mind was the reverse inequality via orbits. It turns out that that can be viewed in a way that makes its logic feel much more natural than it did before, which like many a proof not understood, seems to spring out of the blue.

We wish to show that the number of times, letting $p^r$ be the largest $p$th power dividing $n$, that the order of some orbit is divided by $p$ is no more than $r-k$. To do that it suffices to show that the sum of the orders of the orbits, $\binom{n}{p^k}$ is divided by $p$ no more than that many times. To show that is very mechanical. Write out as $m\displaystyle\prod_{j = 1}^{p^k-1} \frac{p^k m - j}{p^k - j}$ and divide out each element of the product on both the numerator and denominator by $p$ to the number of times $j$ divides it. With this, the denominator of the product is not a multiple of $p$, which means the number of times $p$ divides the sum of the orders of the orbits is the number of times it divides $m$, which is $r-k$.

Following this, Brian Bi told me about this problem, starred in Artin, which means it was considered by the author to be difficult, that he was stuck on. To my great surprise, I managed to solve it under half an hour. The problem is:

Let $H$ be a proper subgroup of a finite group $G$. Prove that the conjugate subgroups of $H$ don’t cover $G$.

For this, I remembered the relation $|G| = |N(H)||Cl(H)|$, where $Cl(H)$ denotes the number of conjugate subgroups of $H$, which is a special case of the orbit-stabilizer theorem, as conjugation is a group action after all. With this, given that $|N(H)| \geq |H|$ and that conjugate subgroups share the identity, the union of them has less than $|G|$ elements.

I remember Jonah Sinick’s once saying that finite group theory is one of the most g-loaded parts of math. I’m not sure what his rationale is for that exactly. I’ll say that I have a taste for finite group theory though I can’t say I’m a freak at it, unlike Aschbacher, but I guess I’m not bad at it either. Sure, it requires some form of pattern recognition and abstraction visualization that is not so loaded on the prior knowledge front. Brian Bi keeps telling me about how hard finite group theory is, relative to the continuous version of group theory, the Lie groups, which I know next to nothing about at present.

Oleg Olegovich, who told me today that he had proved “some generalization of something to semi-simple groups,” but needs a bit more to earn the label of Permanent Head Damage, suggested upon my asking him what he considers as good mathematics that I look into Arnold’s classic on classical mechanics, which was first to come to mind on his response of “stuff that is geometric and springs out of classical mechanics.” I found a PDF of it online and browsed through it but did not feel it was that tasteful, perhaps because I’m been a bit immersed lately in the number theoretic and abstract algebraic side of math that intersects not with physics, though I had before an inclination towards more physicsy math. I thought of possibly learning PDEs and some physics as a byproduct of it, but I’m also worried about lack of focus. Maybe eventually I can do that casually without having to try too hard as I have done lately for number theory. At least, I have not the right combination of brainpower and interest sufficient for that in my current state of mind.

## Composition series

My friend after some time in industry is back in school, currently taking graduate algebra. I was today looking at one of his homework and in particular, I thought about and worked out one of the problems, which is to prove the uniqueness part of the Jordan-Hölder theorem. Formally, if $G$ is a finite group and

$1 = N_0 \trianglelefteq N_1 \trianglelefteq \cdots \trianglelefteq N_r = G$ and $1 = N_0' \trianglelefteq N_1' \trianglelefteq \cdots \trianglelefteq N_s' = G$

are composition series of $G$, then $r = s$ and there exists $\sigma \in S_r$ and isomorphisms $N_{i+1} / N_i \cong N_{\sigma(i)+1} / N_{\sigma(i)}$.

Suppose WLOG that $s \geq r$ and as a base case $s = 2$. Then clearly, $s = r$ and if $N_1 \neq N_1'$, $N_1 \cap N_1' = 1$. $N_1 N_1' = G$ must hold as it is normal in $G$. Now, remember there is a theorem which states that if $H, K$ are normal subgroups of $G = HK$ with $H \cap K = 1$, then $G \cong H \times K$. (This follows from $(hkh^{-1})k^{-1} = h(kh^{-1}k^{-1})$, which shows the commutator to be the identity). Thus there are no other normal proper subgroups other than $H$ and $K$.

For the inductive step, take $H = N_{r-1} \cap N_{s-1}'$. By the second isomorphism theorem, $N_{r-1} / H \cong G / N_{s-1}'$. Take any composition series for $H$ to construct another for $G$ via $N_{r-1}$. This shows on application of the inductive hypothesis that $r = s$. One can do the same for $N_{s-1}'$. With both our composition series linked to two intermediary ones that differ only between $G$ and the common $H$ with factors swapped in between those two, our induction proof completes.

## Automorphisms of quaternion group

I learned this morning from Brian Bi that the automorphism group of the quaternion group is in fact $S_4$. Why? The quaternion group is generated by any two of $i,j,k$ all of which have order $4$. $\pm i, \pm j, \pm k$ correspond to the six faces of a cube. Remember that the symmetries orientation preserving of cube form $S_4$ with the objects permuted the space diagonals. Now what do the space diagonals correspond to? Triplet bases $(i,j,k), (-i,j,-k), (j,i,-k), (-j,i,k)$, which correspond to four different corners of the cube, no two of which are joined by a space diagonal. We send both our generators $i,j$ to two of $\pm i, \pm j, \pm k$; there are $6\cdot 4 = 24$ choices. There are by the same logic $24$ triplets $(x,y,z)$ of quaternions such that $xy = z$. We define an equivalence relation with $(x,y,z) \sim (-x,-y,z)$ and $(x,y,z) \sim (y,z,x) \sim (z,x,y)$ that is such that if two elements are in the same equivalence class, then results of the application of any automorphism on those two elements will be as well. Furthermore, no two classes are mapped to the same class. Combined, this shows that every automorphism is a bijection on the equivalence classes.