Variants of the Schwarz lemma

Take some self map on the unit disk \mathbb{D}, f. If f(0) = 0, g(z) = f(z) / z has a removable singularity at 0. On |z| = r, |g(z)| \leq 1 / r, and with the maximum principle on r \to 1, we derive |f(z)| \leq |z| everywhere. In particular, if |f(z)| = |z| anywhere, constancy by the maximum principle tells us that f(z) = \lambda z, where |\lambda| = 1. g with the removable singularity removed has g(0) = f'(0), so again, by the maximum principle, |f'(0)| = 1 means g is a constant of modulus 1. Moreover, if f is not an automorphism, we cannot have |f(z)| = |z| anywhere, so in that case, |f'(0)| < 1.

Cauchy’s integral formula in complex analysis

I took a graduate course in complex analysis a while ago as an undergraduate. However, I did not actually understand it well at all, to which is a testament that much of the knowledge vanished very quickly. It pleases me though now following some intellectual maturation, after relearning certain theorems, they seem to stick more permanently, with the main ideas behind the proof more easily understandably clear than mind-disorienting, the latter of which was experienced by me too much in my early days. Shall I say it that before I must have been on drugs of something, because frankly the way about which I approached certain things was frankly quite weird, and in retrospect, I was in many ways an animal-like creature trapped within the confines of an addled consciousness oblivious and uninhibited. Almost certainly never again will I experience anything like that. Now, I can only mentally rationalize the conscious experience of a mentally inferior creature but such cannot be experienced for real. It is almost like how an evangelical cannot imagine what it is like not to believe in God, and even goes as far as to contempt the pagan. Exaltation, exhilaration was concomitant with the leap of consciousness till it not long after established its normalcy.

Now, the last of non-mathematical writing in this post will be on the following excerpt from Grothendieck’s Récoltes et Semailles:

In those critical years I learned how to be alone. [But even] this formulation doesn’t really capture my meaning. I didn’t, in any literal sense learn to be alone, for the simple reason that this knowledge had never been unlearned during my childhood. It is a basic capacity in all of us from the day of our birth. However these three years of work in isolation [1945–1948], when I was thrown onto my own resources, following guidelines which I myself had spontaneously invented, instilled in me a strong degree of confidence, unassuming yet enduring, in my ability to do mathematics, which owes nothing to any consensus or to the fashions which pass as law….By this I mean to say: to reach out in my own way to the things I wished to learn, rather than relying on the notions of the consensus, overt or tacit, coming from a more or less extended clan of which I found myself a member, or which for any other reason laid claim to be taken as an authority. This silent consensus had informed me, both at the lycée and at the university, that one shouldn’t bother worrying about what was really meant when using a term like “volume,” which was “obviously self-evident,” “generally known,” “unproblematic,” etc….It is in this gesture of “going beyond,” to be something in oneself rather than the pawn of a consensus, the refusal to stay within a rigid circle that others have drawn around one—it is in this solitary act that one finds true creativity. All others things follow as a matter of course.

Since then I’ve had the chance, in the world of mathematics that bid me welcome, to meet quite a number of people, both among my “elders” and among young people in my general age group, who were much more brilliant, much more “gifted” than I was. I admired the facility with which they picked up, as if at play, new ideas, juggling them as if familiar with them from the cradle—while for myself I felt clumsy, even oafish, wandering painfully up an arduous track, like a dumb ox faced with an amorphous mountain of things that I had to learn (so I was assured), things I felt incapable of understanding the essentials or following through to the end. Indeed, there was little about me that identified the kind of bright student who wins at prestigious competitions or assimilates, almost by sleight of hand, the most forbidding subjects.

In fact, most of these comrades who I gauged to be more brilliant than I have gone on to become distinguished mathematicians. Still, from the perspective of thirty or thirty-five years, I can state that their imprint upon the mathematics of our time has not been very profound. They’ve all done things, often beautiful things, in a context that was already set out before them, which they had no inclination to disturb. Without being aware of it, they’ve remained prisoners of those invisible and despotic circles which delimit the universe of a certain milieu in a given era. To have broken these bounds they would have had to rediscover in themselves that capability which was their birthright, as it was mine: the capacity to be alone.

Grothendieck was first known to me the dimwit in a later stage of high school. At that time, I was still culturally under the idiotic and shallow social constraints of an American high school, though already visibly different, unable to detach too much from it both intellectually and psychologically. There is quite an element of what I now in recollection with benefit of hindsight can characterize as a harbinger of unusual aesthetic discernment, one exercised and already vaguely sensed back then though lacking in reinforcement in social support and confidence, and most of all, in ability. For at that time, I was still much of a species in mental bondage, more often than not driven by awe as opposed to reason. In particular, I awed and despaired at many a contemporary of very fine range of myself who on the surface appeared to me so much more endowed and quick to grasp and compute, in an environment where judgment of an individual’s capability is dominated so much more so by scores and metrics, as opposed to substance, not that I had any of the latter either.

Vaguely, I recall seeing the above passage once in high school articulated with so much of verbal richness of a height that would have overwhelmed and intimidated me at the time. It could not be understood by me how Grothendieck, this guy considered by many as greatest mathematician of the 20th century, could have actually felt dumb. Though I felt very dumb myself, I never fully lost confidence, sensing a spirit in me that saw quite differently from others, that was far less inclined to lose himself in “those invisible and despotic circles” than most around me. Now, for the first time, I can at least subjectively feel identification with Grothendieck, and perhaps I am still misinterpreting his message to some extent, though I surely feel far less at sea with respect to that now than before.

Later I had the fortune to know personally one who gave a name to this implicit phenomenon, aesthetic discernment. It has been met with ridicule as a self-congratulatory achievement one of lesser formal achievement, a concoction of a failure in self-denial. Yet on the other hand, I have witnessed that most people are too carried away in today’s excessively artificially institutionally credentialist society that they lose sight of what is fundamentally meaningful, and sadly, those unperturbed by this ill are few and fewer. Finally, I have reflected on the question of what good is knowledge if too few can rightly perceive it. Science is always there and much of it of value remains unknown to any who has inhabited this planet, and I will conclude at that.

So, one of the theorems in that class was of course Cauchy’s integral formula, one of the most central tools in complex analysis. Formally,

Let D be a bounded domain with piecewise smooth boundary. If f(z) is analytic on D, and f(z) extends smoothly to the boundary of D, then

f(z) = \frac{1}{2\pi i}\int_{\partial D} \frac{f(w)}{w-z}dw,\qquad z \in D. \ \ \ \ (1)

This theorem was actually somewhat elusive to me. I would learn it, find it deceptively obvious, and then forget it eventually, having to repeat this cycle. I now ask how one would conceive of this theorem. On that, we first observe that by continuity, we can show that the average on a circle will go to its value at the center as the radius goes to zero. With dw = i\epsilon e^{i\theta}d\theta, we can with the w - z in the denominator, vanish out any factor of f(z + \epsilon e^{i\theta}) in the integrand. From this, we have the result if D sufficiently small circle. Even with this, there is implicit Cauchy’s integral theorem, the one which states that integral of holomorphic function inside on closed curve is zero. Speaking of which, we can extend to any bounded domain with piecewise smooth boundary along the same principle.

Cauchy’s integral formula is powerful when the integrand is bounded. We have already seen this in Montel’s theorem. In another even simpler case, in Riemann’s theorem on removable singularities, we can with our upper bound on the integrand M, establish with M / r^n establish that for n < 0, the coefficient in the Laurent series about the point is a_n = 0.

This integral formula extends to all derivatives by differentiating. Inductively, with uniform convergence of the integrand, one can show that

f^{(m)}(z) = \frac{m!}{2\pi i}\int_{\partial D} \frac{f(w)}{(w-z)^{m+1}}dw, \qquad z \in D, m \geq 0.

An application of this for a bounded entire function would be to contour integrate along an arbitrarily large circle to derive an n!M / R^n upper bound (which goes to 0 as R \to \infty) on the derivatives. This gives us Liouville’s theorem, which states that bounded entire functions are constant, by Taylor series.


Weierstrass products

Long time ago when I was a clueless kid about the finish 10th grade of high school, I first learned about Euler’s determination of \zeta(2) = \frac{\pi^2}{6}. The technique he used was of course factorization of \sin z / z via its infinitely many roots to

\displaystyle\prod_{n=1}^{\infty} \left(1 - \frac{z}{n\pi}\right)\left(1 + \frac{z}{n\pi}\right) = \displaystyle\prod_{n=1}^{\infty} \left(1 - \frac{z^2}{n^2\pi^2}\right).

Equating the coefficient of z^2 in this product, -\displaystyle\sum_{n=1}^{\infty}\frac{1}{n^2\pi^2}, with the coefficient of z^2 in the well-known Maclaurin series of \sin z / z, -1/6, gives that \zeta(2) = \frac{\pi^2}{6}.

This felt to me, who knew almost no math, so spectacular at that time. It was also one of great historical significance. The problem was first posed by Pietro Mengoli in 1644, and had baffled the most genius of mathematicians of that day until 1734, when Euler finally stunned the mathematical community with his simple yet ingenious solution. This was done when Euler was in St. Petersburg. On that, I shall note that from this, we can easily see how Russia had a rich mathematical and scientific tradition that began quite early on, which must have deeply influenced the preeminence in science of Tsarist Russia and later the Soviet Union despite their being in practical terms quite backward compared to the advanced countries of Western Europe, like UK and France, which of course was instrumental towards the rapid catching up in industry and technology of the Soviet Union later on.

I had learned of this result more or less concurrently with learning on my own (independent of the silly American public school system) what constituted a rigorous proof. I remember back then I was still not accustomed to the cold, precise, and austere rigor expected in mathematics and had much difficulty restraining myself in that regard, often content with intuitive solutions. From this, one can guess that I was not quite aware of how Euler’s solution was in fact not a rigorous one by modern standards, despite its having been noted from the book from which I read this. However, now I am aware that what Euler constructed was in fact a Weierstrass product, and in this article, I will explain how one can construct those in a way that guarantees uniform convergence on compact sets.

Given a finite number of points on the complex plane, one can easily construct an analytic function with zeros or poles there for any combination of (finite) multiplicities. For a countably infinite number of points, one can as well the same way but how can one know that it, being of a series nature, doesn’t blow up? There is quite some technical machinery to ensure this.

We begin with the restricted case of simple poles and arbitrary residues. This is a special case of what is now known as Mittag-Leffler’s theorem.

Theorem 1.1 (Mittag-Leffler) Let z_1,z_2,\ldots \to \infty be a sequence of distinct complex numbers satisfying 0 < |z_1| \leq |z_2| \leq \ldots. Let m_1, m_2,\ldots be any sequence of non-zero complex numbers. Then there exists a (not unique) sequence p_1, p_2, \ldots of non-negative integers, depending only on the sequences (z_n) and (m_n), such that the series f (z)

f(z) = \displaystyle\sum_{n=1}^{\infty} \left(\frac{z}{z_n}\right)^{p_n} \frac{m_n}{z - z_n} \ \ \ \ (1.1)

is totally convergent, and hence absolutely and uniformly convergent, in any compact set K \subset \mathbb{C} \ {z_1,z_2,\ldots}. Thus the function f(z) is meromorphic, with simple poles z_1, z_2, \ldots having respective residues m_1, m_2, \ldots.

Proof: Total convergence, in case forgotten, refers to the Weierstrass M-test. That said, it suffices to establish

\left|\left(\frac{z}{z_n}\right)^{p_n}\frac{m_n}{z-z_n}\right| < M_n,

where \sum_{n=1}^{\infty} M_n < \infty. For total convergence on any compact set, we again use the classic technique of monotonically increasing disks to \infty centered at the origin with radii r_n \leq |z_n|. This way for |z| \leq r_n, we have

\left|\left(\frac{z}{z_n}\right)^{p_n}\frac{m_n}{z-z_n}\right| < \left(\frac{r_n}{|z_n|}\right)^{p_n}\frac{m_n}{|z_n|-r_n} < M_n.

With r_n < |z_n| we can for any M_n choose large enough p_n to satisfy this. This makes clear that the \left(\frac{z}{z_n}\right)^{p_n} is our mechanism for constraining the magnitude of the values attained, which we can do to an arbitrary degree.

The rest of the proof is more or less trivial. For any K, pick some r_N the disk of which contains it. For n < N, we can bound with \displaystyle\max_{z \in K}\left|\left(\frac{z}{z_n}\right)^{p_n}\frac{m_n}{z-z_n}\right|, which must be bounded by continuity on compact set (now you can see why we must omit the poles from our domain).     ▢

Lemma 1.1 Let the functions u_n(z) (n = 1, 2,\ldots) be regular in a compact set K \subset C, and let the series \displaystyle\sum_{n=1}^{\infty} u_n(z) be totally convergent in K . Then the infinite product \displaystyle\sum_{n=1}^{\infty} \exp (u_n(z)) = \exp\left(\displaystyle\sum_{n=1}^{\infty} u_n(z)\right) is uniformly convergent in K.

Proof: Technical exercise left to the reader.     ▢

Now we present a lemma that allows us to take the result of Mittag-Leffler (Theorem 1.1) to meromorphic functions with zeros and poles at arbitrary points, each with its prescribed multiplicity.

Lemma 1.2 Let f (z) be a meromorphic function. Let z_1,z_2,\ldots = 0 be the poles of f (z), all simple with respective residues m_1, m_2,\ldots \in \mathbb{Z}. Then the function

\phi(z) = \exp \int_0^z f (t) dt \ \ \ \ (1.2)

is meromorphic. The zeros (resp. poles) of \phi(z) are the points z_n such that m_n > 0 (resp. m_n < 0), and the multiplicity of z_n as a zero (resp. pole) of \phi(z) is m_n (resp. -m_n).

Proof: Taking the exponential of that integral has the function of turning it into a one-valued function. Take two paths \gamma and \gamma' from 0 to z with intersects not any of the poles. By the residue theorem,

\int_{\gamma} f(z)dz = \int_{\gamma'} f(z)dz + 2\pi i R,

where R is the sum of residues of f(t) between \gamma and \gamma'. Because the m_is are integers, R must be an integer from which follows that our exponential is a one-valued function. It is also, with the exponential being analytic, also analytic. Moreover, out of boundedness, it is non-zero on \mathbb{C} \setminus \{z_1, z_2, \ldots\}. We can remove the pole at z_1 with f_1(z) = f(z) - \frac{m_1}{z - z_1}. This f_1 remains analytic and is without zeros at \mathbb{C} \setminus \{z_2, \ldots\}. From this, we derive

\begin{aligned} \phi(z) &= \int_{\gamma} f(z)dz \\ &= \int_{\gamma} f_1(z) + \frac{m_1}{z-z_1}dz \\ &= (z-z_1)^{m_1}\exp \int_0^z f_1(t) dt. \end{aligned}

We can continue this process for the remainder of the z_is.      ▢

Theorem 1.2 (Weierstrass) Let F(z) be meromorphic, and regular and \neq 0 at z = 0. Let z_1,z_2, \ldots be the zeros and poles of F(z) with respective multiplicities |m_1|, |m_2|, \ldots, where m_n > 0 if z_n is a zero and m_n < 0 if z_n is a pole of F(z). Then there exist integers p_1, p_2,\ldots \geq 0 and an entire function G(z) such that

F(z) = e^{G(z)}\displaystyle\prod_{n=1}^{\infty}\left(1 - \frac{z}{z_n}\right)^{m_n}\exp\left(m_n\displaystyle\sum_{k=1}^{p_n}\frac{1}{k}\left(\frac{z}{z_k}^k\right)\right), \ \ \ \ (1.3)

where the product converges uniformly in any compact set K \subset \mathbb{C} \ \{z_1,z_2,\ldots\}.

Proof: Let f(z) be the function in (1.1) with p_is such that the series is totally convergent, and let \phi(z) be the function in (1.2). By Theorem 1.1 and Lemma 1.2, \phi(z) is meromorphic, with zeros z_n of multiplicities m_n if m_n > 0, and with poles z_n of multiplicities |m_n| if m_n < 0. Thus F(z) and \phi(z) have the same zeros and poles with the same multiplicities, whence F(z)/\phi(z) is entire and \neq 0. Therefore \log (F(z)/\phi(z)) = G(z) is an entire function, and

F(z) = e^{G(z)} \phi(z). \ \ \ \ (1.4)

Uniform convergence along path of integration from 0 to z (not containing the poles) enables term-by-term integration. Thus, from (1.2), we have

\begin{aligned} \phi(z) &= \exp \displaystyle\sum_{n=1}^{\infty} \left(\frac{z}{z_n}\right)^{p_n} \frac{m_n}{t - z_n}dt \\ &= \displaystyle\prod_{n=1}^{\infty}\exp \int_0^z \left(\frac{m_n}{t - z_n} + \frac{m_n}{z_n}\frac{(t/z_n)^{p_n} -1}{t/z_n - 1}\right)dt \\ &= \displaystyle\prod_{n=1}^{\infty}\exp \int_0^z \left(\frac{m_n}{t - z_n} + \frac{m_n}{z_n}\displaystyle\sum_{k=1}^{p_n}\left(\frac{t}{z_n}\right)^{k-1}\right)dt \\ &= \displaystyle\prod_{n=1}^{\infty}\exp \left(\log\left(1 - \frac{z}{z_n}\right)^{m_n} + m_n\displaystyle\sum_{k=1}^{p_n}\frac{1}{k}\left(\frac{t}{z_n}\right)^k\right) \\ &= \displaystyle\prod_{n=1}^{\infty}\left(1 - \frac{z}{z_n}\right)^{m_n} \exp \left(m_n\displaystyle\sum_{k=1}^{p_n}\frac{1}{k}\left(\frac{t}{z_n}\right)^k\right).\end{aligned}

With this, (1.3) follows from (1.4). Moreover, in a compact set K, we can always bound the length of the path of integration, whence, by Theorem 1.1, the series

\displaystyle\sum_{n=1}^{\infty}\int_0^z \left(\frac{t}{z_n}\right)^{p_n}\frac{m_n}{t - z_n}dt

is totally convergent in K. Finally, invoke Lemma 1.1 to conclude that the exponential of that is total convergent in K as well, from which follows that (1.3) is too, as desired.     ▢

If at 0, our function has a zero or pole, we can easily multiply by z^{-m} with m the multiplicity there to regularize it. This yields

F(z) = z^me^{G(z)}\displaystyle\prod_{n=1}^{\infty}\left(1 - \frac{z}{z_n}\right)^{m_n}\exp\left(m_n\displaystyle\sum_{k=1}^{p_n}\frac{1}{k}\left(\frac{z}{z_n}^k\right)\right)

for Weierstrass factorization formula in this case.

Overall, we see that we transform Mittag-Leffler (Theorem 1.1) into Weierstrass factorization (Theorem 1.2) through integration and exponentiation. In complex, comes up quite often integration of an inverse or -1 order term to derive a logarithm, which once exponentiated gives us a linear polynomial to the power of the residue, useful for generating zeros and poles. Once this is observed, that one can go from the former to the latter with some technical manipulations is strongly hinted at, and one can observe without much difficulty that the statements of Lemma 1.1 and Lemma 1.2 are needed for this.


  • Carlo Viola, An Introduction to Special Functions, Springer International Publishing, Switzerland, 2016, pp. 15-24.

Cayley-Hamilton theorem and Nakayama’s lemma

The Cayley-Hamilton theorem states that every square matrix over a commutative ring A satisfies its own characteristic equation. That is, with I_n the n \times n identity matrix, the characteristic polynomial of A

p(\lambda) = \det (\lambda I_n - A)

is such that p(A) = 0. I recalled that in a post a while ago, I mentioned that for any matrix A, A(\mathrm{adj}(A)) = (\det A) I_n, a fact that is not hard to visualize based on calculation of determinants via minors, which is in fact much of what brings the existence of this adjugate to reason in some sense. This can be used to prove the Cayley-Hamilton theorem.

So we have

(\lambda I_n - A)\mathrm{adj}(\lambda I_n - A) = p(\lambda)I_n,

where p is the characteristic polynomial of A. The adjugate in the above is a matrix of polynomials in t with coefficients that are matrices which are polynomials in A, which we can represent in the form \displaystyle\sum_{i=0}^{n-1}t^i B_i.

We have

\displaystyle {\begin{aligned}p(\lambda)I_{n} &= (\lambda I_n - A)\displaystyle\sum_{i=0}^{n-1}\lambda^i B_i \\ &= \displaystyle\sum_{i=0}^{n-1}\lambda^{i+1}B_{i}-\sum _{i=0}^{n-1}\lambda^{i}AB_{i} \\ &= \lambda^{n}B_{n-1}+\sum _{i=1}^{n-1}\lambda^{i}(B_{i-1}-AB_{i})-AB_{0}.\end{aligned}}

Equating coefficients gives us

B_{n-1} = I_n, \qquad B_{i-1} - AB_i = c_i I_n, 1 \leq i \leq n-1, \qquad -AB_0 = c_0I_0.

With this, we have

A^n + c_{n-1}A^{n-1} + \cdots + c_1A + c_0I_n = A^nB_{n-1} + \displaystyle\sum_{i=1}^{n-1} (A^iB_{i-1} - A^{i+1}B_i) - AB_0 = 0,

with the RHS telescoping and annihilating itself to 0.

There is generalized version of this for a module over a ring, which goes as follows.

Cayley-Hamilton theorem (for modules) Let A be a commutative ring with unity, M a finitely generated A-module, I an ideal of A, \phi an endomorphism of M with \phi M \subset IM.

Proof: It’s mostly the same. Let \{m_i\} \subset M be a generating set. Then for every i, \phi(m_i) \in IM, with \phi(m_i) = \displaystyle\sum_{j=1}^n a_{ij}m_j, with the a_{ij}s in I. This means by closure properties of ideals the polynomial coefficients in the above will stay in I.     ▢

From this follows easily a statement of Nakayama’s lemma, ubiquitous in commutative algebra.

Nakayama’s lemma  Let I be an ideal in R, and M a finitely-generated module over R. If IM = M, then there exists an r \in R with r \equiv 1 \pmod{I}, such that rM = 0.

Proof: With reference to the Cayley-Hamilton theorem, take \phi = I_M, the identity map on M, and define the polynomial p as above. Then

rI_M = p(I_M) = (1 + c_{n-1} + c_{n-2} + \cdots + c_0)I_M = 0

both annihilates the c_is, coefficients residing in I, so that r \equiv 1 \pmod{I} and gives the zero map on M in order for rM = 0.     ▢

Implicit function theorem and its multivariate generalization

The implicit function theorem for a single output variable can be stated as follows:

Single equation implicit function theorem. Let F(\mathbf{x}, y) be a function of class C^1 on some neighborhood of a point (\mathbf{a}, b) \in \mathbb{R}^{n+1}. Suppose that F(\mathbf{a}, b) = 0 and \partial_y F(\mathbf{a}, b) \neq 0. Then there exist positive numbers r_0, r_1 such that the following conclusions are valid.

a. For each \mathbf{x} in the ball |\mathbf{x} - \mathbf{a}| < r_0 there is a unique y such that |y - b| < r_1 and F(\mathbf{x}, y) = 0. We denote this y by f(\mathbf{x}); in particular, f(\mathbf{a}) = b.

b. The function f thus defined for |\mathbf{x} - \mathbf{a}| < r_0 is of class C^1, and its partial derivatives are given by

\partial_j f(\mathbf{x}) = -\frac{\partial_j F(\mathbf{x}, f(\mathbf{x}))}{\partial_y F(\mathbf{x}, f(\mathbf{x}))}.

Proof. For part (a), assume without loss of generality positive \partial_y F(\mathbf{a}, b). By continuity of that partial derivative, we have that in some neighborhood of (\mathbf{a}, b) it is positive and thus for some r_1 > 0, r_0 > 0 there exists f such that |\mathbf{x} - \mathbf{a}| < r_0 implies that there exists a unique y (by intermediate value theorem along with positivity of \partial_y F) such that |y - b| < r_1 with F(\mathbf{x}, y) = 0, which defines some function y = f(\mathbf{x}).

To show that f has partial derivatives, we must first show that it is continuous. To do so, we can let r_1 be our \epsilon and use the same process to arrive at our \delta, which corresponds to r_0.

For part (b), to show that its partial derivatives exist and are equal to what we desire, we perturb \mathbf{x} with an \mathbf{h} that we let WLOG be

\mathbf{h} = (h, 0, \ldots, 0).

Then with k = f(\mathbf{x}+\mathbf{h}) - f(\mathbf{x}), we have F(\mathbf{x} + \mathbf{h}, y+k) = F(\mathbf{x}, y) = 0. From the mean value theorem, we can arrive at

0 = h\partial_1F(\mathbf{x}+t\mathbf{h}, y + tk) + k\partial_y F(\mathbf{x}+t\mathbf{h}, y+tk)

for some t \in (0,1). Rearranging and taking h \to 0 gives us

\partial_j f(\mathbf{x}) = -\frac{\partial_j F(\mathbf{x}, y)}{\partial_y F(\mathbf{x}, y)}.

The following can be generalized to multiple variables, with k implicit functions and k constraints.     ▢

Implicit function theorem for systems of equations. Let \mathbf{F}(\mathbf{x}, \mathbf{y}) be an \mathbb{R}^k valued functions of class C^1 on some neighborhood of a point \mathbf{F}(\mathbf{a}, \mathbf{b}) \in \mathbb{R}^{n+k} and let B_{ij} = (\partial F_i / \partial y_j)(\mathbf{a}, \mathbf{b}). Suppose that \mathbf{F}(\mathbf{x}, \mathbf{y}) = \mathbf{0} and \det B \neq 0. Then there exist positive numbers r_0, r_1 such that the following conclusions are valid.

a. For each \mathbf{x} in the ball |\mathbf{x} - \mathbf{a}| < r_0 there is a unique \mathbf{y} such that |\mathbf{y} - \mathbf{b}| < r_1 and \mathbf{F}(\mathbf{x}, \mathbf{y}) = 0. We denote this \mathbf{y} by \mathbf{f}(\mathbf{x}); in particular, \mathbf{f}(\mathbf{a}) = \mathbf{b}.

b. The function \mathbf{f} thus defined for |\mathbf{x} - \mathbf{a}| < r_0 is of class C^1, and its partial derivatives \partial_j \mathbf{f} can be computed by differentiating the equations \mathbf{F}(\mathbf{x}, \mathbf{f}(\mathbf{x})) = \mathbf{0} with respect to x_j and solving the resulting linear system of equations for \partial_j f_1, \ldots, \partial_j f_k.

Proof: For this we will be using Cramer’s rule, which is that one can solve a linear system Ax = y (provided of course that A is non-singular) by taking matrix obtained from substituting the kth column of A with y and letting x_k be the determinant of that matrix divided by the determinant of A.

From this, we are somewhat hinted that induction is in order. If B is invertible, then one of its k-1 \times k-1 submatrices is invertible. Assume WLOG that such applies to the one determined by B^{kk}. With this in mind, we can via our inductive hypothesis have

F_1(\mathbf{x}, \mathbf{y}) = F_2(\mathbf{x}, \mathbf{y}) = \cdots = F_{k-1}(\mathbf{x}, \mathbf{y}) = 0

determine y_j = g_j(\mathbf{x}, y_k) for j = 1,2,\ldots,k-1. Here we are making y_k an independent variable and we can totally do that because we are inducting on the number of outputs (and also constraints). Substituting this into the F_k constraint, this reduces to the single variable case, with

G(\mathbf{x}, y_k) = F_k(\mathbf{x}, \mathbf{g}(\mathbf{x}, y_k), y_k) = 0.

It suffices now to show via our \det B \neq 0 hypothesis that \frac{\partial G}{\partial y_k} \neq 0. Routine application of the chain rule gives

\frac{\partial G}{\partial y_k} = \displaystyle\sum_{j=1}^{k-1} \frac{\partial F_k}{\partial y_j} \frac{\partial g_j}{\partial y_k} + \frac{\partial F_k}{\partial y_k} = \displaystyle\sum_{j=1}^{k-1} B^{kj} \frac{\partial g_j}{\partial y_k} + B^{kk}. \ \ \ \ (1)

The \frac{\partial g_j}{\partial y_k}s are the solution to the following linear system:

\begin{pmatrix} \frac{\partial F_1}{\partial y_1}  & \dots & \frac{\partial F_1}{\partial y_{k-1}} \\ \; & \ddots \; \\ \frac{\partial F_{k-1}}{\partial y_1} & \dots & \frac{\partial F_{k-1}}{\partial y_{k-1}} \end{pmatrix} \begin{pmatrix} \frac{\partial g_1}{\partial y_k} \\ \vdots \\ \frac{\partial g_{k-1}}{\partial y_k} \end{pmatrix} = \begin{pmatrix} \frac{-\partial F_1}{\partial y_k} \\ \vdots \\ \frac{-\partial F_{k-1}}{\partial y_k} \end{pmatrix} .

Let M^{ij} denote the k-1 \times k-1 submatrix induced by B_{ij}. We see then that in the replacement for Cramer’s rule, we arrive at what is M^{kj} but with the last column swapped to the left k-j-1 times such that it lands in the jth column and also with a negative sign, which means

\frac{\partial g_j}{\partial y_k}(\mathbf{a}, b_k) = (-1)^{k-j} \frac{\det M^{jk}}{\det M^{kk}}.

Now, we substitute this into (1) to get

\begin{aligned}\frac{\partial G}{\partial y_k}(\mathbf{a}, b_k) &= \displaystyle_{j=1}^{k-1} (-1)^{k-j}B_{kj}\frac{\det M^{kj}}{\det M^{kk}} + B_kk \\ &= \frac{\sum_{j=1}^k (-1)^{j+k} B_{kj}\det M^{kj}}{\det M^{kk}} \\ &= \frac{\det B}{\det M^{kk}} \\ &\neq 0. \end{aligned}

Finally, we apply the implicit function theorem for one variable for the y_k that remains.     ▢


  • Gerald B. Folland, Advanced Calculus, Prentice Hall, Upper Saddle River, NJ, 2002, pp. 114–116, 420–422.


A nice consequence of Baire category theorem

In a complete metric space X, we call a point x for which \{x\} is open an isolated point. If X is countable and there are no isolated points, we can take \displaystyle\cap_{x \in X} X \setminus x = \emptyset, with each of the X \setminus x open and dense, to violate the Baire category theorem. From that, we can arrive at the proposition that in a complete metric space, no isolated points implies that the space uncountable, and similarly, that countable implies there is an isolated point.


Urysohn metrization theorem

The Urysohn metrization theorem gives conditions which guarantee that a topological space is metrizable. A topological space (X, \mathcal{T}) is metrizable is there is a metric that induces a topology that is equivalent to the topological space itself. These conditions are that the space is regular and second-countable. Regular means that any combination of closed subset and point not in it is separable, and second-countable means there is a countable basis.

Metrization is established by embedding the topological space into a metrizable one (every subspace of a metrizable space is metrizable). Here, we construct a metrization of [0,1]^{\mathbb{N}} and use that for the embedding. We first prove that regular and second-countable implies normal, which is a hypothesis of Urysohn’s lemma. We then use Urysohn’s lemma to construct the embedding.

Lemma Every regular, second-countable space is normal.

Proof: Let B_1, B_2 be the sets we want to separate. We can construct a countable open cover of B_1, \{U_i\}, whose closures intersect not B_2 by taking a open neighborhoods of each element of B_1. With second-countability, the union of those can be represented as a union of a countable number of open sets, which yields our desired cover. Do the same for B_2 to get a similar cover \{V_i\}.

Now we wish to minus out from our covers in such a way that their closures are disjoint. We need to modify each of the U_is and V_is such that they do not mutually intersect in their closures. A way to do that would be that for any U_i and V_j, we have the part of \bar{U_i} in V_j subtracted away from it if j \geq i and also the other way round. This would give us U_i' = U_i \setminus \sum_{j=1}^i \bar{V_j} and V_i' = V_i \setminus \sum_{j=1}^i \bar{V_j}.     ▢

Urysohn’s lemma Let A and B be disjoint closed sets in a normal space X. Then, there is a continuous function f : X \to [0,1] such that f(A) = \{0\} and f(B) = \{1\}.

Proof: Observe that if for all dyadic fractions (those with least common denominator a power of 2) r \in (0,1), we assign open subsets of X U(r) such that

  1. U(r) contains A and is disjoint from B for all r
  2. r < s implies that \overline{U(r)} \subset U(s)

and set f(x) = 1 if x \notin U(r) for any r and f(x) = \inf \{r : x \in U(r)\} otherwise, we are mostly done. Obviously, f(A) = \{0\} and f(B) = \{1\}. To show that it is continuous, it suffices to show that the preimages of [0, a) and (a, 1] are open for any x. For [0, a), the preimage is the union of U(r) over r < a, as for any element to go to a' < a, by being an infimum, there must be a s \in (a', a) such that U(s) contains it. Now, suppose f(x) \in (a, 1] and take s \in (a, f(x)). Then, X \setminus \bar{U(s)} is an open neighborhood of x that maps to a subset of (a, 1]. We see that x \in X \setminus \overline{U(s)}, with if otherwise, s < f(x) and thereby f(x) \leq s' < f(x) for s' > s and U(s') \supset \overline{U(s)}. Moreover, with s > a, we have excluded anything that does not map above a.

Now we proceed with the aforementioned assignment of subsets. In the process, we construct another assignment V. Initialize U(1) = X \setminus B and V(0) = X \setminus A. Let U(1/2) and V(1/2) be disjoint open sets containing A and B respectively (this is where we need our normality hypothesis). Notice how in normality, we have disjoint closed sets B_1 and B_2 with open sets U_1 and U_2 disjoint which contain them respectively, one can complement B_1 to derive a closed set larger than U_2, which we call U_2' and run the same normal separation process on A_1 and U_2'. With this, we can construct U(1/4), U(3/4), V(1/4), V(3/4) and the relations

X \setminus V(0) \subset U(1/4) \subset X \setminus V(1/4) \subset U(1/2),

X \setminus U(1) \subset V(3/4) \subset X \setminus U(3/4) \subset V(1/2).

Inductively, we can show that we can continue this process on X \setminus V(a/2^n) and X \setminus U((a+1)/2^n) for each a = 0,1,\ldots,2^n-1 provided U and V on all dyadics with denominator 2^n to fill in the ones with denominator 2^{n+1}. One can draw a picture to help visualize this process and to see that this satisfies the required aforementioned conditions for U.     ▢

Now we will find a metric for \mathbb{R}^{\mathbb{N}} the product space. Remember that the base for product space is such that all projections are open and a cofinite of them are the full space itself (due to closure under only finite intersection). Thus our metric must be such that every \epsilon-ball contains some open set of the product space where a cofinite number of the indices project to \mathbb{R}. The value of x - y for x,y \in \mathbb{R} as well as its powers is unbounded, so obviously we need to enforce that the distance exceed not some finite value, say 1. We also need that for any \epsilon > 0, the distance contributed by all of the indices but a finite number exceeds it not. For this, we can tighten the upper bound on the ith index to 1/i, and instead of summing (what would be a series), we take a \sup, which allows for all n > N where 1/N < \epsilon, the nth index is \mathbb{R} as desired. We let our metric be

D(\mathbf{x}, \mathbf{y}) = \sup\{\frac{\min(|x_i-y_i|, 1)}{i} : i \in \mathbb{N}\}.

That this satisfies the conditions of metric is very mechanical to verify.

Proposition The metric D induces the product topology on \mathbb{R}^{\mathbb{N}}.

Proof: An \epsilon-ball about some point must be of the form

(x_1 - \epsilon/2, x_1 + \epsilon/2) \times (x_2 - 2\epsilon/2, x_2 + 2\epsilon/2) \times \cdots \times (x_n - n\epsilon/2, x_n + n\epsilon/2) \times \mathbb{R} \times \cdots \times \mathbb{R} \times \cdots,

where n is the largest such that n\epsilon < 1. Clearly, we can fit into that an open set of the product space.

Conversely, take any open set and assume WLOG that it is connected. Then, there must be only a finite set of natural number indices I which project to not the entire space but instead to those with length we can assume to be at most 1. That must have a maximum, which we call n. For this we can simply take the minimum over i \leq n of the length of the interval for i divided by i as our \epsilon.     ▢

Now we need to construct a homeomorphism from our second-countable, regular (and thereby normal) space to some subspace of \mathbb{R}^\mathbb{N}. A homeomorphism is injective as part of definition. How to satisfy that? Provide a countable collection of continuous functions to \mathbb{R} such that at least one of them differs whenever two points differ. Here normal comes in handy. Take any two distinct points. Take two non-intersecting closed sets around them and invoke Urysohn’s lemma to construct a continuous function. That would have to be 0 at one and 1 at the other. Since our space is second-countable, we can do that for each pair of points with only a countable number. For every pair in the basis B_n, B_m where \bar{B_n} \subset B_m, we do this on \bar{B_n} and X \setminus B_m.

Proposition Our above construction is homeomorphic to [0,1]^{\mathbb{R}}.

Proof: Call our function f. Each of its component functions is continuous so the entire Cartesian product is also continuous. It remains to show the other way, that U in the domain open implies the image of U is open. For that it is enough to take z_0 = f(x_0) for any x_0 \in U and find some open neighborhood of it contained in f(U). U contains some basis element of the space and thus, there is a component (call it f_n) that sends X \setminus U to all to 0 and x_0 not to 0. This essentially partitions X by 0 vs not 0, with the latter portion lying inside U, which means that \pi_n^{-1}((0, \infty)) \cap f(X) is strictly inside f(U). The projections in product space are continuous so that set must be open. This suffices to show that f(U) is open.     ▢

With this, we’ve shown our arbitrary regular, second-countable space to be homeomorphic to a space we directly metrized, which means of course that any regular, second-countable space is metrizable, the very statement of the Urysohn metrication theorem.