Big Picard theorem

I’ve been asked to prove the Big Picard theorem, assuming the fundamental normality test. Assuming the latter, it is a very short proof, and I could half-ass with that. I don’t like writing up stuff that I don’t actually understand for the sake of doing so. There’s little point, and if I’m going to actually write up a proof of it, I’ll do so for real, which means that I go over the fundamental normality test in its entirety.

First some preliminaries.

Theorem 2.28 (Riemann mapping theorem). Let \Omega \subset \mathbb{C} be simply-connected and \Omega \neq \mathbb{C}. Then there exists a conformal homeomorphism f : \Omega \to \mathbb{D} onto the unit disk \mathbb{D}.

Proof: Linked here.

Theorem 2.30. Suppose n is bounded, simply-connected, and regular. Then any conformal homeomorphism as in Theorem 2.28 extends to a homeomorphism \bar{\Omega} \to \bar{\mathbb{D}}.

Schwarz reflection principle. Suppose that f is an analytic function which is defined in the upper half-disk \{|z|^2 < 1, \text{Im } z > 0\}. Further suppose that f extends to a continuous function on the real axis, and takes on real values on the real axis. Then f can be extended to an analytic function on the whole disk by the formula

f(\bar{z}) = \overline{f(z)}

and the values for z reflected across the real axis are the reflections of f(z) across the real axis.

We begin by presenting the standard “geometric” procedure by which the covering map \pi : \mathbb{D} \to \mathbb{C} \setminus \{p_1, p_2\} may be obtained. Here p_1, p_2 are distinct points. This then leads naturally to the “little” and “big” Picard theorems, which are fundamental results of classical function theory.


The construction takes place in the Poincaré disk. In the above figure, we have a circle C_2 reflected about C_1. The configuration is such that C_2 intersects C_1 perpendicularly. We are reflecting C_2 across C_1. The intersection points must be fixed, and the reflection must preserve the orthogonality. Moreover, reflection preserves the geodesic nature, and under the hyperbolic metric, geodesics are generalized circles. From this, we can deduce that C_2 goes to itself, with its two arcs relative to C_1 interchanged.

To construct the map, we start with a triangle \Delta_0 inside the unit circle consisting of circular arcs that intersect the unit circle at right angles. Reflect \Delta_0 across each of its sides and one gets three more triangles with circular arcs intersecting the unit circle at right angles.


The above figure shows how the unit disk is partitioned by triangles as a result of iterating these reflections indefinitely. To obtain the sought after covering map, we start from the Riemann mapping theorem which gives us a conformal isomorphism f : \Delta_0 \to \mathbb{H}, the upper half-plane. This map extends as a homeomorphism to the boundary by Theorem 2.30. Thus, the three circular arcs of \Delta_0 get mapped to the intervals [-\infty, 0], [0,1], [1,\infty], respectively. By the Schwarz reflection principle, the map f extends analytically to the region obtained by reflecting \Delta_0 across each of its sides as just explained above. There is that 0, 1, \infty are on the boundary of the unit disk and thus omitted. There is also that complex conjugation as specified in the Schwarz reflection principle reflects the upper half plane to the lower half plane. This way, we obtain a conformal map onto \mathbb{C} \setminus \{0, 1\} defined on the entire unit disk that is a local isomorphism and a covering map.

Theorem 4.18. Every entire function which omits two values is constant.

Proof. Indeed, if f is such a function, we may assume that it takes its values in \mathbb{C} \setminus \{0, 1\}. But then we can lift f to the universal cover of \mathbb{C} \setminus \{0, 1\} to obtain an entire function F into \mathbb{D}. By Liouville’s theorem, F is constant.     ▢

Theorem 4.19 (Fundamental normality test). Any family of functions \mathcal{F} in \mathcal{H}(\Omega) which omits the same two distinct values in \mathbb{C} is a normal family.

Theorem 4.20. If f has an isolated essential singularity at z_0, then in any small neighborhood of z_0 the function f attains every complex value infinitely often, with one possible exception.

Proof. Suppose without loss of generality that z_0 = 0 and define f_n(z) = f(2^{-n}z) for an integer n \geq 1. We take n so large that f_n is analytic on 0 < |z| < 2 by making the neighborhood about z_0 sufficiently small. Suppose by contradiction every neighborhood of f omits the some two points. Then every function in this family, defined via f, omits the same two points. Thus, by the fundamental normality test, some subsequence of the family f_{n_k}(z) \to F(z) uniformly on 1/2 \leq |z| \leq 1 where either F is analytic or F = \infty by Weierstrass’s theorem (see here). By the maximum principle, in the former case, f is bounded near z = 0, which means it’s removable. In the latter case, convergence to \infty implies that z = 0 is a pole, contradicting that f has an essential singularity there.     ▢


  • Schlag, W., A Course in Complex Analysis and Riemann Surfaces, American Mathematical Society, 2014, pp. 70-72,81,160-164.

On grad school, science, academia, and also a problem on Riemann surfaces

I like mathematics a ton and I am not bad at it. In fact, I am probably better than many math graduate students at math, though surely, they will have more knowledge than I do in some respects, or maybe even not that, because frankly, the American undergrad math major curriculum is often rather pathetic, well maybe largely because the students kind of suck. In some sense, you have to be pretty clueless to be majoring in just pure math if you’re not a real outlier at it, enough to have a chance at a serious academic career. Of course, math professors won’t say this. So we have now an excess of people who really shouldn’t be in science (because they much lack the technical power or an at least reasonable scientific taste/discernment, or more often both) adding noise to the job market. On this, Katz in his infamous Don’t Become a Scientist piece writes:

If you are in a position of leadership in science then you should try to persuade the funding agencies to train fewer Ph.D.s. The glut of scientists is entirely the consequence of funding policies (almost all graduate education is paid for by federal grants). The funding agencies are bemoaning the scarcity of young people interested in science when they themselves caused this scarcity by destroying science as a career. They could reverse this situation by matching the number trained to the demand, but they refuse to do so, or even to discuss the problem seriously (for many years the NSF propagated a dishonest prediction of a coming shortage of scientists, and most funding agencies still act as if this were true). The result is that the best young people, who should go into science, sensibly refuse to do so, and the graduate schools are filled with weak American students and with foreigners lured by the American student visa.

Even he believes that now the Americans who go into science are often the ones who are too dumb or clueless to realize that they basically have no future there. I can surely attest to how socially inept, or at least clueless, many math grad students are, as I interact with them much more now. The epidemic described by Katz is accentuated by the fact that professors in science are not encouraging of students who seek a plan B, which everyone should given the way the job market is right now, and even go as far as to create an atmosphere wherein even to express a desire to leave academia is a no-no. I am finding that this type of environment is even corroding my interest in mathematics itself, which is sad. In any case, I sort of disagree with Katz in that I feel like the very top scientific talent of my generation still mostly ends in top or at least good graduate schools, though surely there are many who feel alienated or don’t find the risk worth taking, and end up leaving science. I myself am thinking of forgetting about mathematics altogether. So that I can concentrate my motivation and time and energy on developing expertise in some area of software engineering that is in demand, for the money and (relative) job security, and hopefully also find it a sufficiently fulfilling experience. There are a lot of morons in tech of course, but certain corners of it do provide refuge. I had always thought of mathematics as being a field with a much higher threshold cognitively in its content, enough to filter out most of the uninteresting people, but that’s, to my disappointment, less so than I expected. I do have reason to be scared, because one of the smartest and most interesting people I know took like five years following his math PhD to make his way into full employment, in a programming/data science heavy role of course, despite being arguably much better at programming than most industry software engineers with a computer science degree, which he lacked, an indicator of the perverse extent to which our society now runs on risk-aversion and (artificial) credential signaling. I can only consider myself fortunate that I do have a computer science degree from a reputable place, and with that, I have already made a modest pot of gold, despite being frankly quite mediocre at real computer stuff, which I have had difficulty becoming as interested in as I have been in mathematics. Maybe I was even fortunate to have not been all that gifted in the first place, which in some sense compelled me to be more realistic, as there is arguably nothing worse than becoming an academic loser, which academia is full of nowadays, sadly. This type of thing can happen to real geniuses too. Look at Yitang Zhang for instance, the most prominent case to come to mind. Except he actually made it afterwards, spectacularly and miraculously, with his dogged belief in himself and perseverance under adversity. For every one of him, I would expect like 10 real geniuses (in ability) who were under-nurtured, under-recognized, or even screwed, left to fade into obscurity.

I’ll transition now to a problem that I’ve been asked to solve. Its statement is the following:

Let f be holomorphic on a simply-connected Riemann surface M, and assume that f never vanishes. Then there exists F holomorphic on M such that f = e^F. Show that harmonic functions on M have conjugate harmonic functions.

Every p_0 \in M corresponds to an open connected neighborhood U =  \{p : \lVert F(p) - F(p_0) \rVert < F(p_0)\}. Let \{U_{\alpha}\} be the system consisting of these neighborhoods, (\log F)_{\alpha} a continuous branch of the logarithm of F in U_{\alpha}. From this arises a family F_{\alpha} = \{(\log F)_{\alpha} + 2n\pi i, n \in \mathbb{Z}\}.

In Schlag, there is the following lemma.

Lemma 5.5. Suppose M is a simply-connected Riemann surface and

\{D_{\alpha} \subset M : \alpha \in A\}

is a collection of domains (connected, open). Assume further that these sets form an open cover M = \bigcup_{\alpha \in A} D_{\alpha} such that for each \alpha \in A there is a family F_{\alpha} of analytic functions f : D_{\alpha} \to N, where N is some other Riemann surface, with the following properties: if f \in F_{\alpha} and p \in D_{\alpha} \cap D_{\beta}, then there is some g \in F_{\beta} so that f = g near p. Then given \gamma \in A and some f \in F_{\gamma} there exists an analytic function \psi_{\gamma} : M \to N so that \psi_{\gamma} = f on D_{\gamma}.

Using the families of analytic function F_{\alpha} as given above, it is clear that near p \in D_{\alpha} \cap D_{\beta}, (\log F)_{\alpha} + 2n_{\alpha}\pi i = (\log F)_{\beta} + 2n_{\beta}\pi i when n_{\alpha} = n_{\beta}, which means the hypothesis of Lemma 5.5 is satisfied by the above families.

I’ll present the proof of the above lemma here, to consolidate my own understanding, and also out of its essentiality in the construction of a global holomorphic function matching some function in each family. It does so in generality of course, whereas in the problem we are trying to solve it is on a specific case.

Proof. Let

\mathcal{U} = \{(p, f) | p \in D_{\alpha}, f \in F_{\alpha}, \alpha \in A\} / \sim

where (p, f) \sim (q, g) iff p = q and f = g in a neighborhood of p. Let [p, f] denote the equivalence class of (p, f). As usual, \pi([p, f]) = p. For each f \in F_{\alpha}, let

D'_{\alpha, f} = \{[p, f] | p \in D_{\alpha}\}.

Clearly, \pi : D_{\alpha, f}' \to D_{\alpha} is bijective. We define a topology on \mathcal{U} as follows: \Omega \subset D_{\alpha, f}' is open iff \pi(\Omega) \subset D_{\alpha} is open for each \alpha, f \in F_{\alpha}. This does indeed define open sets in \mathcal{U}: since \pi(D'_{\alpha, f} \cap D'_{\beta, g}) is the union of connected components of D_{\alpha} \cap D_{\beta} by the uniqueness theorem (if it is not empty), it is open in M as needed. With this topology, \mathcal{U} is a Hausdorff space since M is Hausdorff (we use this if the base points differ) and because of the uniqueness theorem (which we use if the base points coincide). Note that by construction, we have made the fibers indexed by the functions in F_{\alpha} discrete in the topology of \mathcal{U}.

The main point is now to realize that if \widetilde{M} is a connected component of \mathcal{U}, then \pi : \widetilde{M} \to M is onto and in fact is a covering map. Let us check that it is onto. First, we claim that \pi(\widetilde{M}) \subset M is open. Thus, let [p, f] \in \widetilde{M} and pick D_{\alpha} with p \in D_{\alpha} and pick D_{\alpha} with p \in D_{\alpha} and f \in F_{\alpha}. Clearly, D'_{\alpha, f} \cap \widetilde{M} \neq \emptyset and since D_{\alpha}, and thus also D'_{\alpha, f}, is open and connected, the connected component \widetilde{M} has to contiain D'_{\alpha, f} entirely. Therefore, D_{\alpha} \subset \pi(\widetilde{M}) as claimed.

Next, we need to check that M \setminus \pi(\widetilde{M}) is open. Let p \in M \setminus \pi(\widetilde{M}) and pick D_{\beta} so that p \in D_{\beta}. If D_{\beta} \cap \pi(\widetilde{M}) = \emptyset, then we are done. Otherwise, let q \in D_{\beta} \cap \pi(\widetilde{M}) and pick D_{\alpha} containing q and some f \in F_{\alpha} with D'_{\alpha, f} \subset \widetilde{M} (using the same “nonempty intersection implies containment” argument as above). But now we can find g \in F_{\beta} with the property that f = g on a component of D_{\alpha} \cap D_{\beta}. As before, this implies that \widetilde{M} would have to contain D'_{\beta, g} which is a contradiction.

To see that \pi : \widetilde{M} \to M is a covering map, one verifies that

\pi^{-1}(D_{\alpha}) = \bigcup_{f \in F_{\alpha}} D'_{\alpha, f}.

The sets on the right-hand side are disjoint and in fact they are connected components of \pi^{-1}(D_{\alpha}).

Since M is simply-connected, \widetilde{M} is homeomorphic to M (proof given in the appendix). We thus infer the existence of a globally defined analytic function which agrees with some f \in F_{\alpha} on each D_{\alpha}. By picking the connected component that contains any given D_{\alpha, f}' one can fix the “sheet” locally on a given D_{\alpha}.     ▢

By this, we can construct an analytic F such that for all \alpha,

f_{|U_{\alpha}} = (\log F)_{\alpha} + n_{\alpha} \cdot 2\pi i, \qquad n_{\alpha} \in \mathbb{Z}.

from which follows e^F = f.

For the existence of harmonic conjugates, we do similarly. Take a connected open cover of M, \{U_{\alpha}\} where each U_{\alpha} is conformally equivalent to the unit disc, and v_{\alpha} is a harmonic conjugate of u in U_{\alpha} (which exists uniquely up to constant on the unit disc. Let F_{\alpha} = \{v_{\alpha} + c, \quad c \in \mathbb{R}\}. Then by the same lemma, there exists v such that for all \alpha,

v_{|U_{\alpha}} = v_{\alpha} + c_{\alpha}, \quad \text{some } c_{\alpha} \in \mathbb{R}

that is harmonic and conjugate to u since it is the harmonic conjugate to u on every element of the cover, again with choise of c_{\alpha}s to ensure that on intersection of cover elements there is a match.




Elliptic functions

I am writing this as a way to go through in detail the section on elliptic functions in Schlag’s book.

Proposition 4.14.  Let \Lambda = \{m\omega_1 + n\omega_2 | m,n \in \mathbb{Z}\} and set \Lambda^* = \Lambda \setminus \{0\}. For any integer n \geq 3, the series

f(z) = \displaystyle\sum_{w \in \Lambda} (z+w)^{-n} \qquad (4.16)

defines a function f \in \mathcal{M}(M) with deg(f) = n. Furthermore, the Weierstrass function

\wp(z) = \frac{1}{z^2} + \displaystyle\sum_{w \in \Lambda^*} [(z+w)^{-2} - w^{-2}] ,\qquad (4.17)

is an even elliptic function of degree two with \Lambda as its group of periods. The poles of \wp are precisely the points in \Lambda and they are all of order 2.

Proof.  It suffices to prove that f(z) = \displaystyle\sum_{w \in \Lambda} (z+w)^{-n} converges absolutely and uniformly on every compact set K \subset \mathbb{C} \setminus \Lambda. Periodicity allows us to restrict to the closure of any fundamental region. There exists C > 0 such that for all x,y \in \mathbb{R},

C^{-1}(|x|+|y|) \leq |x\omega_1 + y\omega_2|.

Hence, when z \in \{x\omega_1 + y\omega_2 | 0 \leq x, y \leq 1\}, then

|z + (k_1\omega_1 + k_2\omega_2)| \geq C^{-1}(|k_1| + |k_2|) - |z| \geq (2C)^{-1}(|k_1| + |k_2|)

provided |k_1| + |k_2| is sufficiently large. In

\displaystyle\sum_{|k_1|+|k_2|>0} |k_1 + k_2|^{-n},

there are O(n) occurrences of |k_1| + |k_2| = n, which means the above converges when n > 2, and this, with the above bound, means f \in \mathcal{H}(\mathbb{C} \setminus \Lambda). Periodicity implies f \in \mathcal{M}(M). Moreover, the degree of (4.16) is determined by nothing that inside a fundamental region the series has a unique pole of order n.

For the second part, we note that when |w| > 2|z|,

\left|(z+w)^{-2} - w^{-2}\right| \leq \frac{|z||z+2s|}{|w|^2|z+w|^2} \leq \frac{C|z|}{|w|^3},

which means the series defining \wp, which is clearly even, converges absolutely and uniformly on compact subsets of \mathbb{C} \setminus \Lambda. For the periodicity of \wp, note that \rho' is periodic relative the same lattice \Lambda. Thus, for every w \in \Lambda,

\wp(z+w) - \wp(z) = C(w) \quad \forall z \in \mathbb{C}

with some constant C(w), which has to be zero by

\wp(\omega_1/2) - \wp(-\omega_1/2) = 0.

Another way to go about it to define \sigma such that

\zeta(z) = \frac{d \log \sigma(z)}{dz} = \frac{1}{z} + \displaystyle\sum_{w \in \Lambda^*} \left[\frac{1}{z-\omega} + \frac{1}{\omega} + \frac{z}{\omega^2}\right],

so that \wp = -\zeta', from which by periodicity, we have

\zeta(z+\omega) - \zeta(z) = C(\omega).

From this we can solve that

\sigma(z+\omega_j) = -\sigma(z)e^{\eta_j(z+\omega_j/2)}, \qquad (4.20)

where the \nu_js are constants for j = 1,2.

Lemma 4.15.  With \wp as before, one has

(\wp'(z))^2 = 4(\wp(z) - e_1)(\wp(z) - e_2)(\wp(z) - e_3) \qquad (4.21)

where e_1 = \wp(\omega_1/2), e_2 = \rho(\omega_2/2), and e_3 = \rho((\omega_1+\omega_2)/2) are pairwise distinct. Furthermore, one has e_1 + e_2 + e_3 = 0 so that (4.21) can be written in the form

(\wp'(z))^2 = 4(\wp(z))^3 - g_2\wp(z) - g_3 \qquad (4.22)

with constants g_2 = -4(e_1e_2 + e_1e_3 + e_2e_3) and g_3 = 4e_1e_2e_3.

View the torus as

S = \{x\omega_1 + y\omega_2 | -1/2 \leq x,y \leq 1/2\}.

\wp'(z) is odd and has a pole of order 3 at z = 0 but no other poles in S, which means \wp'(z) has degree 3.

Oddness with periodicity applied at \omega_1/2 and \omega_2/2 yields that

\frac{1}{2}\omega_1, \quad \frac{1}{2}\omega_2, \quad \frac{1}{2}(\omega_1+\omega_2)

are the three zeros of \wp', each simple, and thus also the unique points where \wp has valency 2 apart from z = 0. The e_j are distinct, because if not \wp would assume such a value four times, impossible when the degree is 2.

Denoting the RHS of (4.21) by F(z), we have that

\frac{(\wp'(z))^2}{F(z)} \in \mathcal{H}(M)

with the zeros cancelled out, and thus equal to a constant.

At z = 0, the highest pole of (\wp'(z))^2 is one of order 3\cdot 2 = 6 with coefficient (-2)^2 = 4. In F(z), we have essentially a cubic in \wp(z) with leading coefficient 4, and \wp(z) has pole of order 2 with coefficient 1. In taking the limit towards zero, we only need to consider the 4\wp(z)^3 term, which has the highest order pole, which is also of order 6 with coefficient 4. That means our constant function is 1.

The final statement follows by observing from the Laurent series around zero, which is, from the geometric series of \left(\frac{1/w}{1+(z/w)}\right)^2 - \frac{1}{w^2}

\wp(z) = \frac{1}{z^2} - \displaystyle\sum_{k = 1}^{\infty} (k+1)(-1)^{k}z^k\displaystyle\sum_{w \in \Lambda^*} \frac{1}{w^{3+k}}.

Because \wp is even, the odd coefficients must vanish. So we have

\wp(z) = \frac{1}{z^2} - \displaystyle\sum_{k=1}^{\infty} (2k+1)z^{2k} \displaystyle\sum_{w \in \Lambda^*} \frac{1}{w^{2k+2}}.

For now, let

G_k = \displaystyle\sum_{w \in \Lambda^*} \frac{1}{w^k}.

\begin{aligned}\wp(z) & = & \frac{1}{z^2} + 3G_4z^2 + 5G_6z^4 + \cdots, \\ \wp'(z) & = & \frac{-2}{z^3} + 6G_4z + 20G_6z^3 + \cdots, \\ (\wp(z))^3 & = & \frac{1}{z^6} + 9\frac{G_4}{z^2} + \cdots, \\ (\wp'(z))^2 & = & \frac{4}{z^6} - \frac{24G_4}{z^2} + \cdots. \end{aligned}

What we want is to find the g_2 such that (\wp'(z))^2 - 4(\wp(z))^3 + g_2\wp(z) becomes analytic and thus constant, and to do that we must vanish out all the poles at 0. The z^{-6} coefficients tells us to multiply (\wp(z))^3 by 4. After that, we have from the z^{-2} coefficient that -24G_4 - 9\cdot 4 G_4 + g_2 = 0, which means

g_2 = 60G_4 = -4(e_1e_2 + e_1e_3 + e_2e_3).

Proposition 4.16.  Every f \in \mathcal{M}(M) is a rational function of \wp and \wp'. If f is even, then it is a rational function of \wp alone.

Proof.  Suppose that f is non-constant and even. Then for all but finitely many values of w \in \mathbb{C}_{\infty}, the equation f(z) - w = 0 has only simple zeros (since there are only finitely many zeros of f'). Pick two such w \in \mathbb{C} and denote them by c,d. Moreover, we can ensure that the zeros of f - c and f - d are distinct from the branch points of \wp. Thus, since f is even and with 2n = deg(f), one has:

\begin{aligned}\{z \in M : f(z) - c = 0\} & = \{a_j, -a_j\}_{j=1}^n, \\ \{z \in M : f(z) - d = 0\} & = \{b_j, -b_j\}_{j=1}^n. \end{aligned}

The elliptic functions

g(z) = \frac{f(z) - c}{f(z) - d}


h(z) = \displaystyle\prod_{j=1}^n \frac{\wp(z) - \wp(a_j)}{\wp(z) - \wp(b_j)}

have the same zeros and poles which are all simple. It follows that g = \alpha h for some \alpha \neq 0. Solving this relation for f yields the desired conclusion.

If f is odd, then f/\wp' is even so f = \wp'R(\wp) where R is rational. Finally, if f is any elliptic function, then

f(z) = \frac{1}{2}(f(z) + f(-z)) + \frac{1}{2}(f(z) - f(-z))

is a decomposition into even/odd elliptic functions whence

f(z) = R_1(\wp) + \wp'R_2(\wp)

with rational R_1, R_2 as claimed.     ▢

We conclude with the following question: given disjoint finite sets of distinct points \{z_j\} and \{\zeta_k\} in M as well as positive integers n_j for z_j and \nu_k for \zeta_k, respectively, is there an elliptic function with precisely these zeros and poles and of the given orders? In the case of \mathbb{C}_\infty yes iff \sum_{j} n_j = \sum_{k} \nu_k since the degree must be constant throughout.

For the tori, we first observe that by the residue theorem one has

\frac{1}{2\pi i}\oint_{\partial P} z\frac{f'(z)}{f(z)}dz = \sum_j n_jz_j - \sum_k \nu_k \zeta_k. \qquad (4.25)

where \partial P is the boundary of a fundamental region P such that no zero or pole lies on the boundary. Second, comparing parallel sides of the fundamental region and using the periodicity shows that the left-hand side is (4.25) is of the form n_1\omega_1 + n_2\omega_2 with n_1, n_2 \in \mathbb{Z} and thus equals 0 modulo \Lambda. (This follows from that \int_{\gamma} \frac{f'(z)}{f(z)}dz is the difference of logarithms of the same value, which regardless of branch must be an integer multiple of 2\pi i.)

Now consider the edges in \partial P given by \gamma_1(t) = \{t\omega_1 | 0 \leq t \leq 1\} and \gamma_1(t) = \{\omega_2 + t\omega_1 | 0 \leq t \leq 1\} respectively. By \omega_2-periodicity of \frac{f'(z)}{f(z)} we infer that

\int_{\gamma_1} z\frac{f'(z)}{f(z)}dz + \int_{\gamma_2} z \frac{f'(z)}{f(z)}dz = -\omega_2\int_{\gamma_1}d \log f(z).

The branch of logarithm here is irrelevant, since the arbitrary constant is differentiated away. By periodicity applied to the difference in this integral,

\omega_2 \frac{1}{2\pi i} \int_{\gamma_1} d\log f(z) \in \omega_2 \mathbb{Z}.

The other edge pair gives an element of \omega_1 \mathbb{Z}, whence (4.24).

Theorem 4.17.  Suppose (4.23) and (4.24) hold. Then there exists an elliptic function which has precisely these zeros and poles with the given orders. This function is unique up to a nonzero complex multiplicative constant.

Proof.  Listing the points z_j and \zeta_k expanded out with their respective multiplicities, we obtain sequences z_j' and \zeta_k' of the same length, say n. Shifting the z_j's and \zeta_k's by lattice elements if needed, one has

\sum_{j=1}^n z_j' = \sum_{k=1}^n \zeta_k'.


f(z) = \displaystyle\prod_{j=1}^n \frac{\sigma(z - z_j')}{\sigma(z - \zeta_j')}

using the \sigma in (4.20). Then

\begin{aligned} \frac{f(z+\omega_i)}{f(z)} & = & \displaystyle\prod_{j=1}^n \frac{\sigma(z-z_j' + \omega_i/2)}{\sigma(z - z_j')}\cdot \frac{\sigma(z - \zeta_j')}{\sigma(z - \zeta_j' + \omega_i/2)} \\ & = & \displaystyle\prod_{j=1}^n e^{\eta_i\left[(z - z_j' + \omega_i/2) - (z - \zeta_j' + \omega_i/2)\right]} \\ & = & e^{\eta_i\cdot 0} \\ & = & 1, \end{aligned}

which shows periodicity.    ▢

Finally, we observe how we can solve (4.22) by integrating

\frac{d\wp(z)}{\sqrt{4(\wp(z))^3 - g_2\wp(z) - g_3}} = dz

where we choose some branch of the root, which yields

z - z_0 = \int_{\wp(z_0)}^{\wp(z)} \frac{d\zeta}{\sqrt{4\zeta^3 - g_2\zeta - g_3}}.

In other words, the Weierstrass function \wp is the inverse of an elliptic integral. The integration path in (4.30) needs to be chosen to avoid the zeros and poles of \wp', and the branch of the root is determined by \wp'.

Analogously, \int_{w_0}^w \frac{d\zeta}{\sqrt{1 - \zeta^2}} = z - z_0 is satisfied by w = \sin z with similar restriction on the path and the choice of branch. This case though is a periodic function with one period, whereas in (4.30), there are two periods.


  • Schlag, W., A Course in Complex Analysis and Riemann Surfaces, American Mathematical Society, 2014, pp. 153-157.

Vector fields, flows, and the Lie derivative

Let M be a smooth real manifold. A smooth vector field V on M can be considered as a function from C^{\infty}(M) to C^{\infty}(M). Every function f : M \to \mathbb{R} at every point p \in M is by a vector field (which implicitly associates a tangent vector at every point) taken to some real value, which one can think of as the directional derivative of f along the tangent vector. Moreover, this varies smoothly with p.

Along any vector field, if we start at any point, we can trace a path along the vector field. Imagine a vector field in water based on the velocity that does not change with time. Take a point particle at a point at any time and we can deterministically predict its path both forward in time and backward in time. We call this an integral curve and it is easy to see that integral curves are equivalence classes.

On a manifold M, at a point with chart (U, \varphi), under vector field V, we would have

\frac{\mathrm{d}x^{\mu}(t)}{\mathrm{d}t} = V^{\mu}(x(t)), \qquad (1)

where x^{\mu}(t) is the \muth component of \varphi(x(t)) and V = V^{\mu}\partial / \partial x^{\mu}. This is an ODE which is guaranteed to have a unique solution at least locally, and we assume for now that the parameter t can be maximally extended.

If we attach the initial condition that at t = 0, the integral curve is at x_0, and denote the coordinate by \sigma^{\mu}(t, x_0)(1) becomes

\frac{\mathrm{d}\sigma^{\mu}(t, x_0)}{\mathrm{d}t} = V^{\mu}(\sigma(t, x_0)),

Here, \sigma : \mathbb{R} \times M \to M is called a flow generated by V, which necessarily satisfies

\sigma(t, \sigma(s, x_0)) = \sigma(t+s, x_0)

for any s, t \in \mathbb{R}.

Within this is the structure of a one-parameter family where

(i) \sigma_{s+t} = \sigma_s \circ \sigma_t or \sigma_{s+t}(x_0) = \sigma_s(\sigma_t(x_0)).
(ii) \sigma_0 is the identity map.
(iii) \sigma_{-t} = (\sigma_t)^{-1}.

We now ask the question how a smooth vector field W changes along a smooth vector field V. If our manifold were simply \mathbb{R}^n (with a single identity chart, globally) we would at any point p some direction along V and on an infinitesimal change along that, W would change as well. In this case, it is easy to represent tangent vectors with indexed coordinates. Naively, we could take the displacement in W, divide by the amount of displacement along V and take the limit. However, we have not defined addition of tangent vectors on different tangent spaces. To do so, we would need some meaningful correspondence between values on different tangent spaces. Why can we not simply do vector addition? Recall that tangent space elements are defined in terms of how they act on smooth functions from M to \mathbb{R} instead of directly. It is only because they are linear in themselves with respect to any given such function that we can using vectors to represent them.

We resolve this in a more general fashion by defining the induced map on tangent spaces T_pM and T_{f(p)}N for smooth f : M \to N between manifolds. Recall that an element of a tangent space is a map D : C^{\infty}(M) \to \mathbb{R} (that also satisfies the Leibniz property: D(fg) = Df \cdot g + f \cdot Dg). If g \in C^{\infty}(N), then g \circ f \in C^{\infty}(M). We define the induced map

\Phi_{f, p} : T_p M \to T_{f(p)} N

in the following manner. If D \in T_p(M), then \Phi_{f, p}(D) = D', where D'[g] = D[g \circ f].

We notice how we can apply this on \sigma_t : M \to M in our construction of the Lie derivative \mathcal{L}_V W of a vector field W with respect to vector field V. Since the flow is along V,

\sigma_{-t}^{\mu}(p) = x^{\mu}(p) - tV^{\mu}(p) + O(t^2). \qquad (2)

We define as the induced map of \sigma_t(p)

\Phi_{\sigma_{-t}, \sigma_t(p)} : T_{\sigma_t(p)} M \to T_p M.

If \Phi_{\sigma_{-t}, \sigma_t(p)}(W) = W', then by definition,

W'[f](p) = W[f \circ \sigma_{-t}](\sigma_t(p)).

That means

\mathcal{L}_V W[f](p) = \left(\displaystyle\lim_{t \to 0}\frac{W'(p) - W(p)}{t}\right)[f] = \displaystyle\lim_{t \to 0}\frac{W'[f](p) - W[f](p)}{t}. \qquad (3)

Using that by the chain rule,

\frac{\partial}{\partial x^r}(f \circ \sigma_{-t})(\sigma_t(p)) = \frac{\partial \sigma_{-t}^{\rho}}{\partial x^r}(\sigma_t(p)) \frac{\partial f}{\partial x^{\mu}}(p),

we arrive at

\begin{aligned} W'[f](p) & = W^{\nu}(\sigma_t(p)) \frac{\partial}{\partial x^{\nu}}[f \circ \sigma_{-t}](\sigma_t(p)) \\ & = W^{\nu}(\sigma_t(p)) \frac{\partial \sigma_{-t}^{\mu}}{\partial x^{\nu}}(\sigma_t(p))\frac{\partial f}{\partial x^{\mu}}(p). \qquad (4) \end{aligned}

Using the power series of \sigma_t(p) at p, we get

W^{\nu}(\sigma_t(p)) = W^{\nu}(p) + tV^{\rho}(p) \frac{\partial W^{\nu}}{\partial x^{\rho}} + O(t^2). \qquad (5)

Moreover, by (2),

\frac{\partial}{\partial x^{\nu}} \sigma_{-t}^{\mu}(\sigma_t(p)) = \delta_{\nu}^{\mu} - t \frac{\partial V^{\mu}}{\partial x^{\mu}}(p) + O(t^2). \qquad (6)

Substituting (5) and (6) into (4) yields

\begin{aligned} W'[f](p) & = \left(W^{\nu}(p) + tV^{\rho}(p) \frac{\partial W^{\nu}}{\partial x^{\rho}} + O(t^2)\right)\left(\delta_{\nu}^{\mu} - t \frac{\partial V^{\mu}}{\partial x^{\mu}}(p) + O(t^2)\right)\frac{\partial f}{\partial x^\mu} \\ & = \left(W^{\mu}(p) + t\left(V^{\rho}(p) \frac{\partial W^{\nu}}{\partial x^{\rho}}(p) - W^{\nu}(p) \frac{\partial V^{\mu}}{\partial x^{\nu}}(p)\right) + O(t^2)\right)\frac{\partial f}{\partial x^\mu} \\ & = \left(W^{\mu}(p) + t\left(V^{\nu}(p) \frac{\partial W^{\mu}}{\partial x^{\nu}}(p) - W^{\nu}(p) \frac{\partial V^{\mu}}{\partial x^{\nu}}(p)\right) + O(t^2)\right)\frac{\partial f}{\partial x^\mu}. \qquad (7) \end{aligned}

There is a constant term, a first order term, and an O(t^2). In (3), the constant term is subtracted out, and the O(t^2) contributes nothing to the limit. This means that the Lie derivative is equal to the first order term, with

(\mathcal{L}_V W)^{\mu}(p) = V^{\nu}(p) \frac{\partial W^{\mu}}{\partial x^{\nu}}(p) - W^{\nu}(p) \frac{\partial V^{\mu}}{\partial x^{\nu}}(p). \qquad (8)

Notice how in (4), there is \frac{\partial f}{\partial x^{\mu}} that we have omitted in (8). This is because we are using \partial/\partial x^\mu as the basis of the tangent vector that is applied onto f \in C^{\infty}(M).

We have in (8) what is the \muth component of the Lie bracket of [V,W] where

[V,W]^{\mu} = V^{\nu} \frac{\partial W^{\mu}}{\partial x^{\nu}} - W^{\nu} \frac{\partial V^{\mu}}{\partial x^{\nu}}. \qquad (9)


Sheaves of holomorphic functions

I can sense vaguely that the sheaf is a central definition in the (superficially) horrendously abstract language of modern mathematics. There really does seem to be quite a distance, between crudely speaking, pre-1950 math and post-1950 math in the mainstream in terms of the level of abstraction typically employed. It is my hope that I will eventually accustom myself to the latter instead of viewing it as a very much alien language. It is difficult though, and  there are in fact definitions which take quite me a while to grasp (by this, I mean be able to visualize it so clearly that feel like I won’t ever forget it), which is expected given how long it has taken historically to condense to certain definitions golden in hindsight. In the hope of a step forward in my goal to understand sheaves, I’ll write up the associated definitions in this post.

Definition 1 (Presheaf). Let (X, \mathcal{T}) be a topological space. A presheaf of vector spaces on X is a family \mathcal{F} = \{\mathcal{F}\}_{U \in \mathcal{T}} of vector spaces and a collection of associated linear maps, called restriction maps,

\rho = \{\rho_V^U : \mathcal{F}(U) \to \mathcal{F}(V) | V,U \in \mathcal{T} \text{ and } V \subset U\}

such that

\rho_U^U = \text{id}_{\mathcal{F}(U)} \text{ for all } U \in \mathcal{T}

\rho_W^V \circ \rho_V^U = \rho_W^U \text{ for all } U,V,W \in \mathcal{T} \text{ such that } W \subseteq V \subseteq U.

Given U,V \in \mathcal{T} such that V \subseteq U and f \in \mathcal{F}(U) one often writes f|_V rather than \rho_V^U(f).

Definition 2 (Sheaf). Let \mathcal{F} be a presheaf on a topological space X. We call \mathcal{F} a sheaf on X if for all open sets U \subseteq X and collections of open sets \{U_i \subseteq U\}_{i \in I} such that \cup_{i \in I} U_i = U, \mathcal{F}(U) satisfies the following properties:

  1. For f, g \in F(U) such that f|_{U_i} = g|_{U_i} for all i \in I, it is given that f = g.    (2.1)
  2. For all collections \{f_i \in F(U_i)\}_{i \in I} such that f_i |_{U_i \cap U_j} = f_j |_{U_i \cap U_j} for all i, j \in I there exists f \in F(U) such that f |_{U_i} = f_i for all i \in I.    (2.2)

In more concrete terms, it is not difficult to see that (2.1) is a statement of power series about a point with radius of convergence covering U, and that (2.2) is a statement of analytic continuation.

Definition 3 (Sheaf of holomorphic functions \mathcal{O}). Let X be a Riemann surface. The presheaf \mathcal{O} of holomorphic functions on X is made up of complex vector spaces of holomorphic functions. For all open sets U \subseteq X, \mathcal{O}(U) is the vector space of holomorphic functions on U. The restrictions are the usual restrictions of functions.

Proposition 4  If X is a Riemann surface, then \mathcal{O} is a sheaf on X.

Proof. As \mathcal{O} is a presheaf, it suffices to show properties (2.1) and (2.2)(2.1) follows directly from the definition of restriction of a function. If they agree on every set in the cover of U, they agree on all of U.

For (2.2) take some collection \{f_i \in \mathcal{O}(U_i)\}_{i \in I} such that f_i |_{U_i \cap U_j} = f_j |_{U_i \cap U_j} for all i, j \in I. For x \in U, f(x) = f_i(x) where i \in I such that x \in U. When \in U_i \cap U_jf_i |_{U_i \cap U_j} = f_j |_{U_i \cap U_j} by definition of the f_i. Therefore, f is well-defined. Given any x \in U, there exists some neighborhood U_i \in \mathcal{U} where f_i is holomorphic. From this follows that f is holomorphic, which means f \in \mathcal{O}(U).     ▢

Definition 5 (Direct limit of algebraic objects). Let \langle I, \leq \rangle be a directed set. Let \{A_i : i \in I\} be a family of objects indexed by I and f_{ij}: A_j \rightarrow A_j be a homomorphism for all i \leq j with the following properties:

  1. f_{ii} is the identity of A_i, and
  2. f_{ik} = f_{jk} \circ f_{ij} for all i \leq j \leq k.

Then the pair \langle A_i, f_{ij} \rangle is called a direct system over I.

The direct limit of the direct system \langle A_i, f_{ij} \rangle is denoted by \varinjlim A_i and is defined as follows. Its underlying set is the disjoint union of the A_is modulo a certain equivalence relation \sim:

\varinjlim A_i = \bigsqcup_i A_i \bigg / \sim.

Here, if x_i \in A_i and x_j \in A_j, then x_i \sim x_j iff there is some k \in I with i \leq k, j \leq k such that f_{ik}(x_i) = f_{jk}(x_j).

More concretely, using the sheaf of holomorphic functions on a Riemann surface, we see that here, the indices correspond to open sets with i \leq j meaning U \supset V, and f_{ij} : A_i \to A_j is the restriction \rho_V^U : \mathcal{F}(U) \to \mathcal{F}(V). Two holomorphic functions defined on U and V, represented by x_i and x_j are considered equivalent iff they are equal restricted to some W \subset V \cap U.

Fix a point x \in X and requires that the open sets in consideration are the neighborhoods of it. The direct limit in this case is called the stalk of F at x, denoted F_x. For each neighborhood U of x, the canonical morphism F(U) \to F_x associates to a section s of F over U an element s_x of the stalk F_x called the germ of s at x.

Dually, there is the inverse limit, which in our concrete context is the more abstract language for an analytic continuation.

Definition 6 (Inverse limit of algebraic objects). Let \langle I, \leq \rangle be a directed set. Let \{A_i : i \in I\} be a family of objects indexed by I and f_{ij}: A_j \rightarrow A_j be a homomorphism for all i \leq j with the following properties:

  1. f_{ii} is the identity of A_i, and
  2. f_{ik} = f_{jk} \circ f_{ij} for all i \leq j \leq k.

Then the pair ((A_i)_{i \in I}, (f_{ij})_{i \leq j \in I}) is an inverse system of groups and morphisms over I, and the morphism f_{ij} are called the transition morphisms of the system.

We define the inverse limit of the inverse system ((A_i)_{i \in I}, (f_{ij})_{i \leq j \in I}) as a particular subgroup of the direct product of the A_is:

A = \displaystyle\varprojlim_{i \in I} A_i = \left\{\left.\vec{a} \in \prod_{i \in I} A_i\; \right|\;a_i = f_{ij}(a_j) \text{ for all } i \leq j \text{ in } I\right\}.

What we have essentially are families of holomorphic functions over open sets, and we glue them together via a direct product indexed by open sets under the restriction there must be agreement in values at places where the open sets coincide. This gives us the space of holomorphic functions over the union of the open sets, which is of course a subgroup of the direct product under both addition and multiplication. We have here again the common theme of patching up local pieces to create a global structure.

Construction of Riemann surfaces as quotients

There is a theorem in Chapter 4 Section 5 of Schlag’s complex analysis text. I went through it a month ago, but only half understood it, and it is my hope that passing through it again, this time with writeup, will finally shed light, after having studied in detail some typical examples of such Riemann surfaces, especially tori, the conformal equivalence classes of which can be represented by the fundamental region of the modular group, which arise from quotienting out by lattices on the complex plane, as well as Fuchsian groups.

In the text, the theorem is stated as follows.

Theorem 4.12.  Let \Omega \subset \mathbb{C}_{\infty} and G < \mathrm{Aut}(\mathbb{C}_{\infty}) with the property that

  • g(\Omega) \subset \Omega for all g \in G,
  • for all g \in G, g \neq \mathrm{id}, all fixed points of g in \mathbb{C}_{\infty} lie outside of \Omega,
  • for all K \subset \Omega compact, the cardinality of \{g \in G | g(K) \cap K \neq \phi\} is finite.

Under these assumptions, the natural projection \pi : \Omega \to \Omega / G is a covering map which turns \Omega/G canonically onto a Riemann surface.

The properties essentially say that the we have a Fuchsian group G acting on \Omega \subset \mathbb{C}_{\infty} without fixed points, excepting the identity. To show that quotient space is a Riemann surface, we need to construct charts. For this, notice that without fixed points, there is for all z \in \Omega, a small pre-compact open neighborhood of z denoted by K_z \subset \Omega, so that

g(\overline{K_z} \cap \overline{K_z}) = \emptyset \qquad \forall g \in G, g \neq \mathrm{id}.

So, in K_z no two elements are twice represented, which mean the projection \pi : K_z \to K_z is the identity, and therefore we can use the K_zs as charts. The gs as Mobius transformations are open maps which take the K_zs to open sets. In other words, \pi^{-1}(K_z) = \bigcup_{g \in G} g^{-1}(K_z) with pairwise disjoint open sets g^{-1}(K_z). From this, the K_zs are open sets in the quotient topology. In this scheme, the gs are the transition maps.

Finally, we verify that this topology is Hausdorff. Suppose \pi(z_1) \neq \pi(z_2) and define for all n \geq 1,

A_n = \left\{z \in \Omega | |z-z_1| < \frac{r}{n}\right\} \subset \Omega

B_n = \left\{z \in \Omega | |z-z_2| < \frac{r}{n}\right\} \subset \Omega

where r > 0 is sufficiently small. Define K = \overline{A_1} \cup \overline{B_1} and suppose that \pi(A_n) \cap \pi(B_n) \neq \emptyset for all n \geq 1. Then for some a_n \in A_n and g_n \in G we have

g_n(a_n) \in B_n \qquad \forall n \geq 1.

Since g_n(K) \cap K has finite cardinality, there are only finitely many possibilities for g_n and one of them therefore occurs infinitely often. Pass to the limit n \to \infty and we have g(z_1) = z_2 or \pi(z_1) = \pi(z_2), a contradiction.


Variants of the Schwarz lemma

Take some self map on the unit disk \mathbb{D}, f. If f(0) = 0, g(z) = f(z) / z has a removable singularity at 0. On |z| = r, |g(z)| \leq 1 / r, and with the maximum principle on r \to 1, we derive |f(z)| \leq |z| everywhere. In particular, if |f(z)| = |z| anywhere, constancy by the maximum principle tells us that f(z) = \lambda z, where |\lambda| = 1. g with the removable singularity removed has g(0) = f'(0), so again, by the maximum principle, |f'(0)| = 1 means g is a constant of modulus 1. Moreover, if f is not an automorphism, we cannot have |f(z)| = |z| anywhere, so in that case, |f'(0)| < 1.

Cauchy’s integral formula in complex analysis

I took a graduate course in complex analysis a while ago as an undergraduate. However, I did not actually understand it well at all, to which is a testament that much of the knowledge vanished very quickly. It pleases me though now following some intellectual maturation, after relearning certain theorems, they seem to stick more permanently, with the main ideas behind the proof more easily understandably clear than mind-disorienting, the latter of which was experienced by me too much in my early days. Shall I say it that before I must have been on drugs of something, because the way about which I approached certain things was frankly quite weird, and in retrospect, I was in many ways an animal-like creature trapped within the confines of an addled consciousness oblivious and uninhibited. Almost certainly never again will I experience anything like that. Now, I can only mentally rationalize the conscious experience of a mentally inferior creature but such cannot be experienced for real. It is almost like how an evangelical cannot imagine what it is like not to believe in God, and even goes as far as to contempt the pagan. Exaltation, exhilaration was concomitant with the leap of consciousness till it not long after established its normalcy.

Now, the last of non-mathematical writing in this post will be on the following excerpt from Grothendieck’s Récoltes et Semailles:

In those critical years I learned how to be alone. [But even] this formulation doesn’t really capture my meaning. I didn’t, in any literal sense learn to be alone, for the simple reason that this knowledge had never been unlearned during my childhood. It is a basic capacity in all of us from the day of our birth. However these three years of work in isolation [1945–1948], when I was thrown onto my own resources, following guidelines which I myself had spontaneously invented, instilled in me a strong degree of confidence, unassuming yet enduring, in my ability to do mathematics, which owes nothing to any consensus or to the fashions which pass as law….By this I mean to say: to reach out in my own way to the things I wished to learn, rather than relying on the notions of the consensus, overt or tacit, coming from a more or less extended clan of which I found myself a member, or which for any other reason laid claim to be taken as an authority. This silent consensus had informed me, both at the lycée and at the university, that one shouldn’t bother worrying about what was really meant when using a term like “volume,” which was “obviously self-evident,” “generally known,” “unproblematic,” etc….It is in this gesture of “going beyond,” to be something in oneself rather than the pawn of a consensus, the refusal to stay within a rigid circle that others have drawn around one—it is in this solitary act that one finds true creativity. All others things follow as a matter of course.

Since then I’ve had the chance, in the world of mathematics that bid me welcome, to meet quite a number of people, both among my “elders” and among young people in my general age group, who were much more brilliant, much more “gifted” than I was. I admired the facility with which they picked up, as if at play, new ideas, juggling them as if familiar with them from the cradle—while for myself I felt clumsy, even oafish, wandering painfully up an arduous track, like a dumb ox faced with an amorphous mountain of things that I had to learn (so I was assured), things I felt incapable of understanding the essentials or following through to the end. Indeed, there was little about me that identified the kind of bright student who wins at prestigious competitions or assimilates, almost by sleight of hand, the most forbidding subjects.

In fact, most of these comrades who I gauged to be more brilliant than I have gone on to become distinguished mathematicians. Still, from the perspective of thirty or thirty-five years, I can state that their imprint upon the mathematics of our time has not been very profound. They’ve all done things, often beautiful things, in a context that was already set out before them, which they had no inclination to disturb. Without being aware of it, they’ve remained prisoners of those invisible and despotic circles which delimit the universe of a certain milieu in a given era. To have broken these bounds they would have had to rediscover in themselves that capability which was their birthright, as it was mine: the capacity to be alone.

Grothendieck was first known to me the dimwit in a later stage of high school. At that time, I was still culturally under the idiotic and shallow social constraints of an American high school, though already visibly different, unable to detach too much from it both intellectually and psychologically. There is quite an element of what I now in recollection with benefit of hindsight can characterize as a harbinger of unusual aesthetic discernment, one exercised and already vaguely sensed back then though lacking in reinforcement in social support and confidence, and most of all, in ability. For at that time, I was still much of a species in mental bondage, more often than not driven by awe as opposed to reason. In particular, I awed and despaired at many a contemporary of very fine range of myself who on the surface appeared to me so much more endowed and quick to grasp and compute, in an environment where judgment of an individual’s capability is dominated so much more so by scores and metrics, as opposed to substance, not that I had any of the latter either.

Vaguely, I recall seeing the above passage once in high school articulated with so much of verbal richness of a height that would have overwhelmed and intimidated me at the time. It could not be understood by me how Grothendieck, this guy considered by many as greatest mathematician of the 20th century, could have actually felt dumb. Though I felt very dumb myself, I never fully lost confidence, sensing a spirit in me that saw quite differently from others, that was far less inclined to lose himself in “those invisible and despotic circles” than most around me. Now, for the first time, I can at least subjectively feel identification with Grothendieck, and perhaps I am still misinterpreting his message to some extent, though I surely feel far less at sea with respect to that now than before.

Later I had the fortune to know personally one who gave a name to this implicit phenomenon, aesthetic discernment. It has been met with ridicule as self-congratulatory artificialized by one of lesser formal achievement, a concoction of a failure in self-denial. Yet on the other hand, I have witnessed that most people are too carried away in today’s excessively artificially institutionally credentialist society that they lose sight of what is fundamentally meaningful, and sadly, those unperturbed by this ill are few and fewer. Finally, I have reflected on the question of what good is knowledge if too few can rightly perceive it. Science is always there and much of it of value remains unknown to any who has inhabited this planet, and I will conclude at that.

So, one of the theorems in that class was of course Cauchy’s integral formula, one of the most central tools in complex analysis. Formally,

Let D be a bounded domain with piecewise smooth boundary. If f(z) is analytic on D, and f(z) extends smoothly to the boundary of D, then

f(z) = \frac{1}{2\pi i}\int_{\partial D} \frac{f(w)}{w-z}dw,\qquad z \in D. \ \ \ \ (1)

This theorem was actually somewhat elusive to me. I would learn it, find it deceptively obvious, and then forget it eventually, having to repeat this cycle. I now ask how one would conceive of this theorem. On that, we first observe that by continuity, we can show that the average on a circle will go to its value at the center as the radius goes to zero. With dw = i\epsilon e^{i\theta}d\theta, we can with the w - z in the denominator, vanish out any factor of f(z + \epsilon e^{i\theta}) in the integrand. From this, we have the result if D sufficiently small circle. Even with this, there is implicit Cauchy’s integral theorem, the one which states that integral of holomorphic function inside on closed curve is zero. Speaking of which, we can extend to any bounded domain with piecewise smooth boundary along the same principle.

Cauchy’s integral formula is powerful when the integrand is bounded. We have already seen this in Montel’s theorem. In another even simpler case, in Riemann’s theorem on removable singularities, we can with our upper bound on the integrand M, establish with M / r^n establish that for n < 0, the coefficient in the Laurent series about the point is a_n = 0.

This integral formula extends to all derivatives by differentiating. Inductively, with uniform convergence of the integrand, one can show that

f^{(m)}(z) = \frac{m!}{2\pi i}\int_{\partial D} \frac{f(w)}{(w-z)^{m+1}}dw, \qquad z \in D, m \geq 0.

An application of this for a bounded entire function would be to contour integrate along an arbitrarily large circle to derive an n!M / R^n upper bound (which goes to 0 as R \to \infty) on the derivatives. This gives us Liouville’s theorem, which states that bounded entire functions are constant, by Taylor series.


Weierstrass products

Long time ago when I was a clueless kid about the finish 10th grade of high school, I first learned about Euler’s determination of \zeta(2) = \frac{\pi^2}{6}. The technique he used was of course factorization of \sin z / z via its infinitely many roots to

\displaystyle\prod_{n=1}^{\infty} \left(1 - \frac{z}{n\pi}\right)\left(1 + \frac{z}{n\pi}\right) = \displaystyle\prod_{n=1}^{\infty} \left(1 - \frac{z^2}{n^2\pi^2}\right).

Equating the coefficient of z^2 in this product, -\displaystyle\sum_{n=1}^{\infty}\frac{1}{n^2\pi^2}, with the coefficient of z^2 in the well-known Maclaurin series of \sin z / z, -1/6, gives that \zeta(2) = \frac{\pi^2}{6}.

This felt to me, who knew almost no math, so spectacular at that time. It was also one of great historical significance. The problem was first posed by Pietro Mengoli in 1644, and had baffled the most genius of mathematicians of that day until 1734, when Euler finally stunned the mathematical community with his simple yet ingenious solution. This was done when Euler was in St. Petersburg. On that, I shall note that from this, we can easily see how Russia had a rich mathematical and scientific tradition that began quite early on, which must have deeply influenced the preeminence in science of Tsarist Russia and later the Soviet Union despite their being in practical terms quite backward compared to the advanced countries of Western Europe, like UK and France, which of course was instrumental towards the rapid catching up in industry and technology of the Soviet Union later on.

I had learned of this result more or less concurrently with learning on my own (independent of the silly American public school system) what constituted a rigorous proof. I remember back then I was still not accustomed to the cold, precise, and austere rigor expected in mathematics and had much difficulty restraining myself in that regard, often content with intuitive solutions. From this, one can guess that I was not quite aware of how Euler’s solution was in fact not a rigorous one by modern standards, despite its having been noted from the book from which I read this. However, now I am aware that what Euler constructed was in fact a Weierstrass product, and in this article, I will explain how one can construct those in a way that guarantees uniform convergence on compact sets.

Given a finite number of points on the complex plane, one can easily construct an analytic function with zeros or poles there for any combination of (finite) multiplicities. For a countably infinite number of points, one can as well the same way but how can one know that it, being of a series nature, doesn’t blow up? There is quite some technical machinery to ensure this.

We begin with the restricted case of simple poles and arbitrary residues. This is a special case of what is now known as Mittag-Leffler’s theorem.

Theorem 1.1 (Mittag-Leffler) Let z_1,z_2,\ldots \to \infty be a sequence of distinct complex numbers satisfying 0 < |z_1| \leq |z_2| \leq \ldots. Let m_1, m_2,\ldots be any sequence of non-zero complex numbers. Then there exists a (not unique) sequence p_1, p_2, \ldots of non-negative integers, depending only on the sequences (z_n) and (m_n), such that the series f (z)

f(z) = \displaystyle\sum_{n=1}^{\infty} \left(\frac{z}{z_n}\right)^{p_n} \frac{m_n}{z - z_n} \ \ \ \ (1.1)

is totally convergent, and hence absolutely and uniformly convergent, in any compact set K \subset \mathbb{C} \ {z_1,z_2,\ldots}. Thus the function f(z) is meromorphic, with simple poles z_1, z_2, \ldots having respective residues m_1, m_2, \ldots.

Proof: Total convergence, in case forgotten, refers to the Weierstrass M-test. That said, it suffices to establish

\left|\left(\frac{z}{z_n}\right)^{p_n}\frac{m_n}{z-z_n}\right| < M_n,

where \sum_{n=1}^{\infty} M_n < \infty. For total convergence on any compact set, we again use the classic technique of monotonically increasing disks to \infty centered at the origin with radii r_n \leq |z_n|. This way for |z| \leq r_n, we have

\left|\left(\frac{z}{z_n}\right)^{p_n}\frac{m_n}{z-z_n}\right| < \left(\frac{r_n}{|z_n|}\right)^{p_n}\frac{m_n}{|z_n|-r_n} < M_n.

With r_n < |z_n| we can for any M_n choose large enough p_n to satisfy this. This makes clear that the \left(\frac{z}{z_n}\right)^{p_n} is our mechanism for constraining the magnitude of the values attained, which we can do to an arbitrary degree.

The rest of the proof is more or less trivial. For any K, pick some r_N the disk of which contains it. For n < N, we can bound with \displaystyle\max_{z \in K}\left|\left(\frac{z}{z_n}\right)^{p_n}\frac{m_n}{z-z_n}\right|, which must be bounded by continuity on compact set (now you can see why we must omit the poles from our domain).     ▢

Lemma 1.1 Let the functions u_n(z) (n = 1, 2,\ldots) be regular in a compact set K \subset C, and let the series \displaystyle\sum_{n=1}^{\infty} u_n(z) be totally convergent in K . Then the infinite product \displaystyle\sum_{n=1}^{\infty} \exp (u_n(z)) = \exp\left(\displaystyle\sum_{n=1}^{\infty} u_n(z)\right) is uniformly convergent in K.

Proof: Technical exercise left to the reader.     ▢

Now we present a lemma that allows us to take the result of Mittag-Leffler (Theorem 1.1) to meromorphic functions with zeros and poles at arbitrary points, each with its prescribed multiplicity.

Lemma 1.2 Let f (z) be a meromorphic function. Let z_1,z_2,\ldots = 0 be the poles of f (z), all simple with respective residues m_1, m_2,\ldots \in \mathbb{Z}. Then the function

\phi(z) = \exp \int_0^z f (t) dt \ \ \ \ (1.2)

is meromorphic. The zeros (resp. poles) of \phi(z) are the points z_n such that m_n > 0 (resp. m_n < 0), and the multiplicity of z_n as a zero (resp. pole) of \phi(z) is m_n (resp. -m_n).

Proof: Taking the exponential of that integral has the function of turning it into a one-valued function. Take two paths \gamma and \gamma' from 0 to z with intersects not any of the poles. By the residue theorem,

\int_{\gamma} f(z)dz = \int_{\gamma'} f(z)dz + 2\pi i R,

where R is the sum of residues of f(t) between \gamma and \gamma'. Because the m_is are integers, R must be an integer from which follows that our exponential is a one-valued function. It is also, with the exponential being analytic, also analytic. Moreover, out of boundedness, it is non-zero on \mathbb{C} \setminus \{z_1, z_2, \ldots\}. We can remove the pole at z_1 with f_1(z) = f(z) - \frac{m_1}{z - z_1}. This f_1 remains analytic and is without zeros at \mathbb{C} \setminus \{z_2, \ldots\}. From this, we derive

\begin{aligned} \phi(z) &= \int_{\gamma} f(z)dz \\ &= \int_{\gamma} f_1(z) + \frac{m_1}{z-z_1}dz \\ &= (z-z_1)^{m_1}\exp \int_0^z f_1(t) dt. \end{aligned}

We can continue this process for the remainder of the z_is.      ▢

Theorem 1.2 (Weierstrass) Let F(z) be meromorphic, and regular and \neq 0 at z = 0. Let z_1,z_2, \ldots be the zeros and poles of F(z) with respective multiplicities |m_1|, |m_2|, \ldots, where m_n > 0 if z_n is a zero and m_n < 0 if z_n is a pole of F(z). Then there exist integers p_1, p_2,\ldots \geq 0 and an entire function G(z) such that

F(z) = e^{G(z)}\displaystyle\prod_{n=1}^{\infty}\left(1 - \frac{z}{z_n}\right)^{m_n}\exp\left(m_n\displaystyle\sum_{k=1}^{p_n}\frac{1}{k}\left(\frac{z}{z_k}^k\right)\right), \ \ \ \ (1.3)

where the product converges uniformly in any compact set K \subset \mathbb{C} \ \{z_1,z_2,\ldots\}.

Proof: Let f(z) be the function in (1.1) with p_is such that the series is totally convergent, and let \phi(z) be the function in (1.2). By Theorem 1.1 and Lemma 1.2, \phi(z) is meromorphic, with zeros z_n of multiplicities m_n if m_n > 0, and with poles z_n of multiplicities |m_n| if m_n < 0. Thus F(z) and \phi(z) have the same zeros and poles with the same multiplicities, whence F(z)/\phi(z) is entire and \neq 0. Therefore \log (F(z)/\phi(z)) = G(z) is an entire function, and

F(z) = e^{G(z)} \phi(z). \ \ \ \ (1.4)

Uniform convergence along path of integration from 0 to z (not containing the poles) enables term-by-term integration. Thus, from (1.2), we have

\begin{aligned} \phi(z) &= \exp \displaystyle\sum_{n=1}^{\infty} \left(\frac{z}{z_n}\right)^{p_n} \frac{m_n}{t - z_n}dt \\ &= \displaystyle\prod_{n=1}^{\infty}\exp \int_0^z \left(\frac{m_n}{t - z_n} + \frac{m_n}{z_n}\frac{(t/z_n)^{p_n} -1}{t/z_n - 1}\right)dt \\ &= \displaystyle\prod_{n=1}^{\infty}\exp \int_0^z \left(\frac{m_n}{t - z_n} + \frac{m_n}{z_n}\displaystyle\sum_{k=1}^{p_n}\left(\frac{t}{z_n}\right)^{k-1}\right)dt \\ &= \displaystyle\prod_{n=1}^{\infty}\exp \left(\log\left(1 - \frac{z}{z_n}\right)^{m_n} + m_n\displaystyle\sum_{k=1}^{p_n}\frac{1}{k}\left(\frac{t}{z_n}\right)^k\right) \\ &= \displaystyle\prod_{n=1}^{\infty}\left(1 - \frac{z}{z_n}\right)^{m_n} \exp \left(m_n\displaystyle\sum_{k=1}^{p_n}\frac{1}{k}\left(\frac{t}{z_n}\right)^k\right).\end{aligned}

With this, (1.3) follows from (1.4). Moreover, in a compact set K, we can always bound the length of the path of integration, whence, by Theorem 1.1, the series

\displaystyle\sum_{n=1}^{\infty}\int_0^z \left(\frac{t}{z_n}\right)^{p_n}\frac{m_n}{t - z_n}dt

is totally convergent in K. Finally, invoke Lemma 1.1 to conclude that the exponential of that is total convergent in K as well, from which follows that (1.3) is too, as desired.     ▢

If at 0, our function has a zero or pole, we can easily multiply by z^{-m} with m the multiplicity there to regularize it. This yields

F(z) = z^me^{G(z)}\displaystyle\prod_{n=1}^{\infty}\left(1 - \frac{z}{z_n}\right)^{m_n}\exp\left(m_n\displaystyle\sum_{k=1}^{p_n}\frac{1}{k}\left(\frac{z}{z_n}^k\right)\right)

for Weierstrass factorization formula in this case.

Overall, we see that we transform Mittag-Leffler (Theorem 1.1) into Weierstrass factorization (Theorem 1.2) through integration and exponentiation. In complex, comes up quite often integration of an inverse or -1 order term to derive a logarithm, which once exponentiated gives us a linear polynomial to the power of the residue, useful for generating zeros and poles. Once this is observed, that one can go from the former to the latter with some technical manipulations is strongly hinted at, and one can observe without much difficulty that the statements of Lemma 1.1 and Lemma 1.2 are needed for this.


  • Carlo Viola, An Introduction to Special Functions, Springer International Publishing, Switzerland, 2016, pp. 15-24.

Cayley-Hamilton theorem and Nakayama’s lemma

The Cayley-Hamilton theorem states that every square matrix over a commutative ring A satisfies its own characteristic equation. That is, with I_n the n \times n identity matrix, the characteristic polynomial of A

p(\lambda) = \det (\lambda I_n - A)

is such that p(A) = 0. I recalled that in a post a while ago, I mentioned that for any matrix A, A(\mathrm{adj}(A)) = (\det A) I_n, a fact that is not hard to visualize based on calculation of determinants via minors, which is in fact much of what brings the existence of this adjugate to reason in some sense. This can be used to prove the Cayley-Hamilton theorem.

So we have

(\lambda I_n - A)\mathrm{adj}(\lambda I_n - A) = p(\lambda)I_n,

where p is the characteristic polynomial of A. The adjugate in the above is a matrix of polynomials in t with coefficients that are matrices which are polynomials in A, which we can represent in the form \displaystyle\sum_{i=0}^{n-1}t^i B_i.

We have

\displaystyle {\begin{aligned}p(\lambda)I_{n} &= (\lambda I_n - A)\displaystyle\sum_{i=0}^{n-1}\lambda^i B_i \\ &= \displaystyle\sum_{i=0}^{n-1}\lambda^{i+1}B_{i}-\sum _{i=0}^{n-1}\lambda^{i}AB_{i} \\ &= \lambda^{n}B_{n-1}+\sum _{i=1}^{n-1}\lambda^{i}(B_{i-1}-AB_{i})-AB_{0}.\end{aligned}}

Equating coefficients gives us

B_{n-1} = I_n, \qquad B_{i-1} - AB_i = c_i I_n, 1 \leq i \leq n-1, \qquad -AB_0 = c_0I_0.

With this, we have

A^n + c_{n-1}A^{n-1} + \cdots + c_1A + c_0I_n = A^nB_{n-1} + \displaystyle\sum_{i=1}^{n-1} (A^iB_{i-1} - A^{i+1}B_i) - AB_0 = 0,

with the RHS telescoping and annihilating itself to 0.

There is generalized version of this for a module over a ring, which goes as follows.

Cayley-Hamilton theorem (for modules) Let A be a commutative ring with unity, M a finitely generated A-module, I an ideal of A, \phi an endomorphism of M with \phi M \subset IM.

Proof: It’s mostly the same. Let \{m_i\} \subset M be a generating set. Then for every i, \phi(m_i) \in IM, with \phi(m_i) = \displaystyle\sum_{j=1}^n a_{ij}m_j, with the a_{ij}s in I. This means by closure properties of ideals the polynomial coefficients in the above will stay in I.     ▢

From this follows easily a statement of Nakayama’s lemma, ubiquitous in commutative algebra.

Nakayama’s lemma  Let I be an ideal in R, and M a finitely-generated module over R. If IM = M, then there exists an r \in R with r \equiv 1 \pmod{I}, such that rM = 0.

Proof: With reference to the Cayley-Hamilton theorem, take \phi = I_M, the identity map on M, and define the polynomial p as above. Then

rI_M = p(I_M) = (1 + c_{n-1} + c_{n-2} + \cdots + c_0)I_M = 0

both annihilates the c_is, coefficients residing in I, so that r \equiv 1 \pmod{I} and gives the zero map on M in order for rM = 0.     ▢