Hahn-Banach theorem

I’m pleased to say that I find the derivation of the Hahn-Banach theorem pretty straightforward by now. Let me first state it, for the real case.

Hahn-Banach theorem: Let V be a real vector space. Let p: V \to \mathbb{R} be sublinear. If f : U \to \mathbb{R} be a linear functional on the subspace U \subset V with f(x) \leq p(x) for x \in U, then there exists a linear extension of f to all of V (call it g) such that f(x) \leq g(x) for x \in V with f(x) = g(x) for x \in U and g(x) \leq p(x) for all x \in V.

To show this, start by taking any x_0 \in V \setminus U. We wish to assign some \alpha to x_0 that keeps p as the dominating function in the vector space U + \mathbb{R}x_0. For this to happen, applying the linearity of f and the domination constraint, we can derive

\frac{f(y) - p(y - \lambda x_0)}{\lambda} \leq \alpha \leq \frac{p(y+\lambda x_0) - f(y)}{\lambda}, \quad y \in U, \lambda > 0.

This reduces to

\sup_{y \in U} p(y+x_0) - f(y) \leq \inf_{y \in U} f(y) - p(y-x_0).

Such can be proven via

f(y_1) + f(y_2) = f(y_1 + y_2) \leq p(y_1 + y_2) \leq p(y_1 - x_0) +p(y_2 + x_0), \quad y_1, y_2 \in U.

Now take the space of linear functionals defined on some specific subspace dominated by p. Denote an element of it as (f, U). We introduce a partial order wherein (f, U) \leq (f', U') iff f(x) = f'(x) for x \in U and U \subset U'. We can apply Zorn’s lemma on this, as we can take the union to derive an upper bound for any chain. Any maximal element is necessarily (g, V) as if the domain is not the entire vector space, we can by above construct a larger element.

 

Riesz-Thorin interpolation theorem

I had, a while ago, the great pleasure of going through the proof of the Riesz-Thorin interpolation theorem. I believe I understand the general strategy of the proof, though for sure, I glossed over some details. It is my hope that in writing this, I can fill in the holes for myself at the more microscopic level.

Let us begin with a statement of the theorem.

Riesz-Thorin Interpolation Theorem. Suppose that (X,\mathcal{M}, \mu) and (Y, \mathcal{N}, \nu) are measure spaces and p_0, p_1, q_0, q_1 \in [1, \infty]. If q_0 = q_1 = \infty, suppose also that \mu is semifinite. For 0 < t < 1, define p_t and q_t by

\frac{1}{p_t} = \frac{1-t}{p_0} + \frac{t}{p_1}, \qquad  \frac{1}{q_t} = \frac{1-t}{q_0} + \frac{t}{q_1}.

If T is a linear map from L^{p_0}(\mu) + L^{p_1}(\mu) into L^{q_0}(\nu) + L^{q_1}(\nu) such that \left\|Tf\right\|_{q_0} \leq M_0 \left\|f\right\|_{p_0} for f \in L^{p_0}(\mu) and \left\|Tf\right\|_{q_1} \leq M_1 \left\|f\right\|_{p_1} for f \in L^{p_1}(\mu), then \left\|Tf\right\|_{q_t} \leq M_0^{1-t}M_1^t \left\|f\right\|_{p_t} for f \in L^{p_t}(\mu), 0 < t < 1.

We begin by noticing that in the special case where p = p_0 = p_1,

\left\|Tf\right\|_{q_t} \leq \left\|Tf\right\|_{q_0}^{1-t} \left\|Tf\right\|_{q_1}^t \leq M_0^{1-t}M_1^t \left\|f\right\|_p,

wherein the first inequality is a consequence of Holder’s inequality. Thus we may assume that p_0 \neq p_1 and in particular that p_t < \infty.

Observe that the space of all simple functions on X that vanish outside sets of finite measure has in its completion L_p(\mu) for p < \infty and the analogous holds for Y. To show this, take any f \in L^p(\mu) and any sequence of simple f_n that converges to f almost everywhere, which must be such that f_n \in L^p(\mu), from which follows that they are non-zero on a finite measure. Denote the respective spaces of such simple functions with \Sigma_X and \Sigma_Y.

To show that \left\|Tf\right\|_{q_t} \leq M_0^{1-t}M_1^t \left\|f\right\|_{p_t} for all f \in \Sigma_X, we use the fact that

\left\|Tf\right\|_{q_t} = \sup \left\{\left|\int (Tf)g d\nu \right| : g \in \Sigma_Y, \left\|g\right\|_{q_t'} = 1\right\},

where q_t' is the conjugate exponent to q_t. We can rescale f such that \left\|f\right\|_{p_t} = 1.

From this it suffices to show that across all f \in \Sigma_X, g \in \Sigma_Y with \left\|f\right\|_{p_t} = 1 and \left\|g\right\|_{q_t'} = 1, |\int (Tf)g d\nu| \leq M_0^{1-t}M_1^t.

For this, we use the three lines lemma, the inequality of which has the same value on its RHS.

Three Lines Lemma. Let \phi be a bounded continuous function on the strip 0 \leq \mathrm{Re} z \leq 1 that is holomorphic on the interior of the strip. If |\phi(z)| \leq M_0 for \mathrm{Re} z = 0 and |\phi(z)| \leq M_1 for \mathrm{Re} z = 1, then |\phi(z)| \leq M_0^{1-t} M_1^t for \mathrm{Re} z = t, 0 < t < 1.

This is proven via application of the maximum modulus principle on \phi_{\epsilon}(z) = \phi(z)M_0^{z-1} M_1^{-z} \mathrm{exp}^{\epsilon z(z-1)} for \epsilon > 0. The \mathrm{exp}^{\epsilon z(z-1)} serves of function of |\phi_{\epsilon}(z)| \to 0 as |\mathrm{Im} z| \to \infty for any \epsilon > 0.

We observe that if we construct f_z such that f_t = f for some 0 < \mathrm{Re} t < 1. To do this, we can express for convenience f = \sum_1^m |c_j|e^{i\theta_j} \chi_{E_j} and g = \sum_1^n |d_k|e^{i\theta_k} \chi_{F_k} where the c_j‘s and d_k‘s are nonzero and the E_j‘s and F_k‘s are disjoint in X and Y and take each |c_j| to \alpha(z) / \alpha(t) power for such a fixed t for some \alpha with \alpha(t) > 0. We let t \in (0, 1) be the value corresponding to the interpolated p_t. With this, we have

f_z = \displaystyle\sum_1^m |c_j|^{\alpha(z)/\alpha(t)}e^{i\theta_j}\chi_{E_j}.

Needless to say, we can do similarly for g, with \beta(t) < 1,

g_z = \displaystyle\sum_1^n |d_k|^{(1-\beta(z))/(1-\beta(t))}e^{i\psi_k}\chi_{F_k}.

Together these turn the LHS of the inequality we desire to prove to a complex function that is

\phi(z) = \int (Tf_z)g_z d\nu.

To use the three lines lemma, we must satisfy

|\phi(is)| \leq \left\|Tf_{is}\right\|_{q_0}\left\|g_{is}\right\|_{q_0'} \leq M_0 \left\|f_{is}\right\|_{p_0}\left\|g_{is}\right\|_{q_0'} \leq M_0 \left\|f\right\|_{p_t}\left\|g\right\|_{q_t'} = M_0.

It is not hard to make it such that \left\|f_{is}\right\|_{p_0} = 1 = \left\|g_{is}\right\|_{q_0'}. A sufficient condition for that would be integrands associated with norms are equal to |f|^{p_t/p_0} and |g|^{q_t'/q_0'} respectively, which equates to \mathrm{Re} \alpha(is) = 1 / p_0 and \mathrm{Re} (1-\beta(is)) = 1 / q_0'. Similarly, we find that \mathrm{Re} \alpha(1+is) = 1 / p_1 and \mathrm{Re} (1-\beta(1+is)) = 1 / q_1'. From this, we can solve that

\alpha(z) = (1-z)p_0^{-1}, \qquad \beta(z) = (1-z)q_0^{-1} + zq_1^{-1}.

With these functions inducing a \phi(z) that satisfies the hypothesis of the three lines lemma, our interpolation theorem is shown for such simple functions, from which extend our result to all f \in L^{p_t}(\mu).

To extend this to all of L^p, it suffices that Tf_n \to Tf a.e. for some sequence of measurable simple functions f_n with |f_n| \leq |f| and f_n \to f pointwise. Why? With this, we can invoke Fatou’s lemma (and also that \left\|f_n\right\|_p \to \left\|f\right\|_p by dominated convergence theorem) to obtained the desired result, which is

\left\|Tf\right\|_q \leq \lim\inf \left\|Tf_n\right\|_q \leq \lim\inf M_0^{1-t} M_1^t\left\|Tf_n\right\|_p \leq M_0^{1-t} M_1^t \left\|f\right\|_p.

Recall that convergence in measure is a means to derive a subsequence that converges a.e. So it is enough to show that \displaystyle\lim_{n \to \infty} \mu(\left\|Tf_n - Tf\right\| > \epsilon) = 0 for all \epsilon > 0. This can be done by upper bounding with something that goes to zero. By Chebyshev’s inequality, we have

\mu(\left\|Tf_n - Tf\right\| > \epsilon) \leq \frac{\left\|Tf_n - Tf\right\|_p^p}{\epsilon^p}.

However, recall that in our hypotheses we have constant upper bounds on T in the p_0 and p_1 norms respectively assuming that f is in L^{p_0} and L^{p_1}, which we can make use of.  So apply Chebyshev on any one of q_0 (let’s use this) and q_1, upper bound its upper bound with M_0 or M_1 times \left\|f_n - f\right\|_{p_0}, which must go to zero by pointwise convergence.

Hilbert basis theorem

I remember learning this theorem early 2015, but I could not remember its proof at all. Today, I relearned it. It employed a beautiful induction argument to transfer the Noetherianness (in the form of finite generation) from R to R[x] via the leading coefficient.

Hilbert Basis TheoremIf R is a Noetherian ring, then so is R[x].

Proof: Take some ideal J in R. Notice that if we partition J by degree, we get from the leading coefficients appearing in each an ascending chain (that has to become constant eventually, say at k). Take finite sets A_n \subset J for m \leq n \leq k, where m is the smallest possible non-zero degree such that the I_ns for the leading coefficient ideals are generated. With this we can for any polynomial p construct a finite combination within A = \displaystyle\cup_{n=m}^k A_n that equates to p leading coefficient wise, and thereby subtraction reduces to a lower degree. Such naturally lends itself induction, with m as the base case. For m any lower degree polynomial is the zero polynomial. Now assume, as the inductive hypothesis that A acts as a finite generating set all polynomials with degree at most n. If n+1 \leq k, we can cancel out the leading coefficient using our generating set, and then use the inductive hypothesis. If n+1 > k, we can by our inductive hypothesis generate with A a degree n polynomial with same leading coefficient (and thereby a degree n+1 one multiplying by x) and from that apply our inductive hypothesis again, this time on our difference.

Galois theory

I’ve been quite exhausted lately with work and other annoying life things. So sadly, I haven’t been able to think about math much, let alone write about it. However, this morning on the public transit I was able to indulge a bit by reviewing in my head some essentials behind Galois theory, in particular how its fundamental theorem is proved.

The first part of it states that there is the inclusion reversing relation between the fixed fields and subgroups of the full Galois group and moreover, the degree of the field extension is equal to the index of corresponding subgroup. This equivalence can be easily proved using the primitive element theorem, which I will state and prove.

Primitive element theorem: Let F be a field. F(\alpha)(\beta), the field from adjoining elements \alpha, \beta to F can be represented as F(\gamma) for some single element \gamma. This extends inductively to that any field extension can be represented by some adjoining some primitive element.

Proof: Let \gamma = \alpha + c\beta for some c \in F. We will show that there is such a c such that \beta is contained in F(\gamma). Let f, g be minimal polynomials for \alpha and \beta respectively. Let h(x) = f(\gamma - cx). The minimal polynomial of \beta in F(\gamma) must divide both h and g. Suppose it has degree at least 2. Then there is some \beta' \neq \beta which induces \alpha' = \gamma - c\beta' that is another root of f. With \gamma = \alpha + c\beta = \alpha' + c\beta', there is only a finite number of c such that \beta is not in F(\gamma). QED.

The degree of a field extension corresponds to the degree of the minimal polynomial of its primitive element. That primitive element can be in an automorphism mapped to any one of the roots of the minimal polynomial, thereby determining the same number of cosets.

The second major part of this fundamental theorem states that normality subgroup wise is equivalent to there being a normal extension field wise. To see this, remember that if a field extension is normal, a map that preserves multiplication and addition cannot take an element in the extended field outside it as that would imply that its minimal polynomial has a root outside the extended field, thereby violating normality. Any g in the full Galois group thus in a normal extension escapes not the extended field (which is fixed by the subgroup H we’re trying to prove is normal). Thus for all h \in H, g^{-1}hg also fixes the extended field, meaning it’s in H.

 

Convergence in measure

Let f, f_n (n \in \mathbb{N}) : X \to \mathbb{R} be measurable functions on measure space (X, \Sigma, \mu). f_n converges to f globally in measure if for every \epsilon > 0,

\displaystyle\lim_{n \to \infty} \mu(\{x \in X : |f_n(x) - f(x)| \geq \epsilon\}) = 0.

To see that this means the existence of a subsequence with pointwise convergence almost everywhere, let n_k be such that for n > n_k, \mu(\{x \in X : |f_{n_k}(x) - f(x)| \geq \frac{1}{k}\}) < \frac{1}{k}, with n_k increasing. (We invoke the definition of limit here.) If we do not have pointwise convergence almost everywhere, there must be some \epsilon such that there are infinitely many n_k such that \mu(\{x \in X : |f_{n_k}(x) - f(x)| \geq \epsilon\}) \geq \epsilon. There is no such \epsilon for the subsequence \{n_k\} as \frac{1}{k} \to 0.

This naturally extends to every subsequence’s having a subsequence with pointwise convergence almost everywhere (limit of subsequence is same as limit of sequence, provided limit exists). To prove the converse, suppose by contradiction, that the set of x \in X, for which there are infinitely many n such that |f_n(x) - f(x)| \geq \epsilon for some \epsilon > 0 has positive measure. Then, there must be infinitely many n such that |f_n(x) - f(x)| \geq \epsilon is satisfied by a positive measure set. (If not, we would have a countable set in \mathbb{N} \times X for bad points, whereas there are uncountably many with infinitely bad points.) From this, we have a subsequence without a pointwise convergent subsequence.

 

A observation on conjugate subgroups

Let H and H' be conjugate subgroups of G, that is, for some g \in G, g^{-1}Hg = H'. Equivalently, HgH' = gH', which means there is some element of G/H' such that under the action of H on G/H', its stabilizer subgroup is H, all of the group of the group action. Suppose H is a p-group with index with respect to G non-divisible by p. Then such a fully stabilized coset must exist by the following lemma.

If H is a p-group that acts on \Omega, then |\Omega| = |\Omega_0|\;(\mathrm{mod\;} p), where \Omega_0 is the subset of \Omega of elements fully stabilized by H.

Its proof rests on the use orbit stabilizer theorem to vanish out orbits that are multiples of p.

This is the natural origin of the second Sylow theorem.

Math sunday

I had a chill day thinking about math today without any pressure whatsoever. First I figured out, calculating inductively, that the order of GL_n(\mathbb{F}_p) is (p^n - 1)(p^n - p)(p^n - p^2)\cdots (p^n - p^{n-1}). You calculate the number of k-tuples of column vectors linear independent and from there derive p^k as the number of vectors that cannot be appended if linear independence is to be preserved. A Sylow p-group of that is the group of upper triangular matrices with ones on the diagonal, which has the order p^{n(n-1)/2} that we want.

I also find the proof of the first Sylow theorem much easier to understand now, the inspiration of it. I had always remembered that the Sylow p-group we are looking for can be the stabilizer subgroup of some set of p^k elements of the group where p^k divides the order of the group. By the pigeonhole principle, there can be no more than p^k elements in it. The part to prove that kept boggling my mind was the reverse inequality via orbits. It turns out that that can be viewed in a way that makes its logic feel much more natural than it did before, when like many a proof not understood, seems to spring out of the blue.

We wish to show that the number of times, letting p^r be the largest pth power dividing n, that the order of some orbit is divided by p is no more than r-k. To do that it suffices to show that the sum of the orders of the orbits, \binom{n}{p^k} is divided by p no more than that many times. To show that is very mechanical. Write out as m\displaystyle\prod_{j = 1}^{p^k-1} \frac{p^k m - j}{p^k - j} and divide out each element of the product on both the numerator and denominator by p to the number of times j divides it. With this, the denominator of the product is not a multiple of p, which means the number of times p divides the sum of the orders of the orbits is the number of times it divides m, which is r-k.

Following this, Brian Bi told me about this problem, starred in Artin, which means it was considered by the author to be difficult, that he was stuck on. To my great surprise, I managed to solve it under half an hour. The problem is:

Let H be a proper subgroup of a finite group G. Prove that the conjugate subgroups of H don’t cover G.

For this, I remembered the relation |G| = |N(H)||Cl(H)|, where Cl(H) denotes the number of conjugate subgroups of H, which is a special case of the orbit-stabilizer theorem, as conjugation is a group action after all. With this, given that |N(H)| \geq |H| and that conjugate subgroups share the identity, the union of them has less than |G| elements.

I remember Jonah Sinick’s once saying that finite group theory is one of the most g-loaded parts of math. I’m not sure what his rationale is for that exactly. I’ll say that I have a taste for finite group theory though I can’t say I’m a freak at it, unlike Aschbacher, but I guess I’m not bad at it either. Sure, it requires some form of pattern recognition and abstraction visualization that is not so loaded on the prior knowledge front. Brian Bi keeps telling me about how hard finite group theory is, relative to the continuous version of group theory, the Lie groups, which I know next to nothing about at present.

Oleg Olegovich, who told me today that he had proved “some generalization of something to semi-simple groups,” but needs a bit more to earn the label of Permanent Head Damage, suggested upon my asking him what he considers as good mathematics that I look into Arnold’s classic on classical mechanics, which was first to come to mind on his response of “stuff that is geometric and springs out of classical mechanics.” I found a PDF of it online and browsed through it but did not feel it was that tasteful, perhaps because I’m been a bit immersed lately in the number theoretic and abstract algebraic side of math that intersects not with physics, though I had before an inclination towards more physicsy math. I thought of possibly learning PDEs and some physics as a byproduct of it, but I’m also worried about lack of focus. Maybe eventually I can do that casually without having to try too hard as I have done lately for number theory. At least, I have not the right combination of brainpower and interest sufficient for that in my current state of mind.

一说起偏微分方程,想到此行有不少杰出的浙江裔学者,最典型的可以说是谷超豪。想起,华盛顿大学一位做非交换代数几何的教授,浙江裔也,的儿子,曾经说起他们回国时谷超豪,复旦的,如他父亲一样,逝世了,又半开玩笑言:“据说谷超豪被选为院士,是因为他曾经当过地下党。”记得看到杨振宁对谷超豪有极高的评价,大大出于谷超豪在杨七十年代访问复旦的促动下解决了一系列有关于杨-米尔斯理论的数学问题。之外,还有林芳华,陈贵强,都是非常有名气的这套数学的教授,也都是浙江人。我们都知道浙江人是中国的犹太人,昨天Brian Bi还在说”there are four times more Zhejiangnese than Jews.” 可惜我不是浙江人,所以成为数学家可能希望不大了。:(