Vector fields, flows, and the Lie derivative

Let M be a smooth real manifold. A smooth vector field V on M can be considered as a function from C^{\infty}(M) to C^{\infty}(M). Every function f : M \to \mathbb{R} at every point p \in M is by a vector field (which implicitly associates a tangent vector at every point) taken to some real value, which one can think of as the directional derivative of f along the tangent vector. Moreover, this varies smoothly with p.

Along any vector field, if we start at any point, we can trace a path along the vector field. Imagine a vector field in water based on the velocity that does not change with time. Take a point particle at a point at any time and we can deterministically predict its path both forward in time and backward in time. We call this an integral curve and it is easy to see that integral curves are equivalence classes.

On a manifold M, at a point with chart (U, \varphi), under vector field V, we would have

\frac{\mathrm{d}x^{\mu}(t)}{\mathrm{d}t} = V^{\mu}(x(t)), \qquad (1)

where x^{\mu}(t) is the \muth component of \varphi(x(t)) and V = V^{\mu}\partial / \partial x^{\mu}. This is an ODE which is guaranteed to have a unique solution at least locally, and we assume for now that the parameter t can be maximally extended.

If we attach the initial condition that at t = 0, the integral curve is at x_0, and denote the coordinate by \sigma^{\mu}(t, x_0)(1) becomes

\frac{\mathrm{d}\sigma^{\mu}(t, x_0)}{\mathrm{d}t} = V^{\mu}(\sigma(t, x_0)),

Here, \sigma : \mathbb{R} \times M \to M is called a flow generated by V, which necessarily satisfies

\sigma(t, \sigma(s, x_0)) = \sigma(t+s, x_0)

for any s, t \in \mathbb{R}.

Within this is the structure of a one-parameter family where

(i) \sigma_{s+t} = \sigma_s \circ \sigma_t or \sigma_{s+t}(x_0) = \sigma_s(\sigma_t(x_0)).
(ii) \sigma_0 is the identity map.
(iii) \sigma_{-t} = (\sigma_t)^{-1}.

We now ask the question how a smooth vector field W changes along a smooth vector field V. If our manifold were simply \mathbb{R}^n (with a single identity chart, globally) we would at any point p some direction along V and on an infinitesimal change along that, W would change as well. In this case, it is easy to represent tangent vectors with indexed coordinates. Naively, we could take the displacement in W, divide by the amount of displacement along V and take the limit. However, we have not defined addition of tangent vectors on different tangent spaces. To do so, we would need some meaningful correspondence between values on different tangent spaces. Why can we not simply do vector addition? Recall that tangent space elements are defined in terms of how they act on smooth functions from M to \mathbb{R} instead of directly. It is only because they are linear in themselves with respect to any given such function that we can using vectors to represent them.

We resolve this in a more general fashion by defining the induced map on tangent spaces T_pM and T_{f(p)}N for smooth f : M \to N between manifolds. Recall that an element of a tangent space is a map D : C^{\infty}(M) \to \mathbb{R} (that also satisfies the Leibniz property: D(fg) = Df \cdot g + f \cdot Dg). If g \in C^{\infty}(N), then g \circ f \in C^{\infty}(M). We define the induced map

\Phi_{f, p} : T_p M \to T_{f(p)} N

in the following manner. If D \in T_p(M), then \Phi_{f, p}(D) = D', where D'[g] = D[g \circ f].

We notice how we can apply this on \sigma_t : M \to M in our construction of the Lie derivative \mathcal{L}_V W of a vector field W with respect to vector field V. Since the flow is along V,

\sigma_{-t}^{\mu}(p) = x^{\mu}(p) - tV^{\mu}(p) + O(t^2). \qquad (2)

We define as the induced map of \sigma_t(p)

\Phi_{\sigma_{-t}, \sigma_t(p)} : T_{\sigma_t(p)} M \to T_p M.

If \Phi_{\sigma_{-t}, \sigma_t(p)}(W) = W', then by definition,

W'[f](p) = W[f \circ \sigma_{-t}](\sigma_t(p)).

That means

\mathcal{L}_V W[f](p) = \left(\displaystyle\lim_{t \to 0}\frac{W'(p) - W(p)}{t}\right)[f] = \displaystyle\lim_{t \to 0}\frac{W'[f](p) - W[f](p)}{t}. \qquad (3)

Using that by the chain rule,

\frac{\partial}{\partial x^r}(f \circ \sigma_{-t})(\sigma_t(p)) = \frac{\partial \sigma_{-t}^{\rho}}{\partial x^r}(\sigma_t(p)) \frac{\partial f}{\partial x^{\mu}}(p),

we arrive at

\begin{aligned} W'[f](p) & = W^{\nu}(\sigma_t(p)) \frac{\partial}{\partial x^{\nu}}[f \circ \sigma_{-t}](\sigma_t(p)) \\ & = W^{\nu}(\sigma_t(p)) \frac{\partial \sigma_{-t}^{\mu}}{\partial x^{\nu}}(\sigma_t(p))\frac{\partial f}{\partial x^{\mu}}(p). \qquad (4) \end{aligned}

Using the power series of \sigma_t(p) at p, we get

W^{\nu}(\sigma_t(p)) = W^{\nu}(p) + tV^{\rho}(p) \frac{\partial W^{\nu}}{\partial x^{\rho}} + O(t^2). \qquad (5)

Moreover, by (2),

\frac{\partial}{\partial x^{\nu}} \sigma_{-t}^{\mu}(\sigma_t(p)) = \delta_{\nu}^{\mu} - t \frac{\partial V^{\mu}}{\partial x^{\mu}}(p) + O(t^2). \qquad (6)

Substituting (5) and (6) into (4) yields

\begin{aligned} W'[f](p) & = \left(W^{\nu}(p) + tV^{\rho}(p) \frac{\partial W^{\nu}}{\partial x^{\rho}} + O(t^2)\right)\left(\delta_{\nu}^{\mu} - t \frac{\partial V^{\mu}}{\partial x^{\mu}}(p) + O(t^2)\right)\frac{\partial f}{\partial x^\mu} \\ & = \left(W^{\mu}(p) + t\left(V^{\rho}(p) \frac{\partial W^{\nu}}{\partial x^{\rho}}(p) - W^{\nu}(p) \frac{\partial V^{\mu}}{\partial x^{\nu}}(p)\right) + O(t^2)\right)\frac{\partial f}{\partial x^\mu} \\ & = \left(W^{\mu}(p) + t\left(V^{\nu}(p) \frac{\partial W^{\mu}}{\partial x^{\nu}}(p) - W^{\nu}(p) \frac{\partial V^{\mu}}{\partial x^{\nu}}(p)\right) + O(t^2)\right)\frac{\partial f}{\partial x^\mu}. \qquad (7) \end{aligned}

There is a constant term, a first order term, and an O(t^2). In (3), the constant term is subtracted out, and the O(t^2) contributes nothing to the limit. This means that the Lie derivative is equal to the first order term, with

(\mathcal{L}_V W)^{\mu}(p) = V^{\nu}(p) \frac{\partial W^{\mu}}{\partial x^{\nu}}(p) - W^{\nu}(p) \frac{\partial V^{\mu}}{\partial x^{\nu}}(p). \qquad (8)

Notice how in (4), there is \frac{\partial f}{\partial x^{\mu}} that we have omitted in (8). This is because we are using \partial/\partial x^\mu as the basis of the tangent vector that is applied onto f \in C^{\infty}(M).

We have in (8) what is the \muth component of the Lie bracket of [V,W] where

[V,W]^{\mu} = V^{\nu} \frac{\partial W^{\mu}}{\partial x^{\nu}} - W^{\nu} \frac{\partial V^{\mu}}{\partial x^{\nu}}. \qquad (9)

 

Sheaves of holomorphic functions

I can sense vaguely that the sheaf is a central definition in the (superficially) horrendously abstract language of modern mathematics. There really does seem to be quite a distance, between crudely speaking, pre-1950 math and post-1950 math in the mainstream in terms of the level of abstraction typically employed. It is my hope that I will eventually accustom myself to the latter instead of viewing it as a very much alien language. It is difficult though, and  there are in fact definitions which take quite me a while to grasp (by this, I mean be able to visualize it so clearly that feel like I won’t ever forget it), which is expected given how long it has taken historically to condense to certain definitions golden in hindsight. In the hope of a step forward in my goal to understand sheaves, I’ll write up the associated definitions in this post.

Definition 1 (Presheaf). Let (X, \mathcal{T}) be a topological space. A presheaf of vector spaces on X is a family \mathcal{F} = \{\mathcal{F}\}_{U \in \mathcal{T}} of vector spaces and a collection of associated linear maps, called restriction maps,

\rho = \{\rho_V^U : \mathcal{F}(U) \to \mathcal{F}(V) | V,U \in \mathcal{T} \text{ and } V \subset U\}

such that

\rho_U^U = \text{id}_{\mathcal{F}(U)} \text{ for all } U \in \mathcal{T}

\rho_W^V \circ \rho_V^U = \rho_W^U \text{ for all } U,V,W \in \mathcal{T} \text{ such that } W \subseteq V \subseteq U.

Given U,V \in \mathcal{T} such that V \subseteq U and f \in \mathcal{F}(U) one often writes f|_V rather than \rho_V^U(f).

Definition 2 (Sheaf). Let \mathcal{F} be a presheaf on a topological space X. We call \mathcal{F} a sheaf on X if for all open sets U \subseteq X and collections of open sets \{U_i \subseteq U\}_{i \in I} such that \cup_{i \in I} U_i = U, \mathcal{F}(U) satisfies the following properties:

  1. For f, g \in F(U) such that f|_{U_i} = g|_{U_i} for all i \in I, it is given that f = g.    (2.1)
  2. For all collections \{f_i \in F(U_i)\}_{i \in I} such that f_i |_{U_i \cap U_j} = f_j |_{U_i \cap U_j} for all i, j \in I there exists f \in F(U) such that f |_{U_i} = f_i for all i \in I.    (2.2)

In more concrete terms, it is not difficult to see that (2.1) is a statement of power series about a point with radius of convergence covering U, and that (2.2) is a statement of analytic continuation.

Definition 3 (Sheaf of holomorphic functions \mathcal{O}). Let X be a Riemann surface. The presheaf \mathcal{O} of holomorphic functions on X is made up of complex vector spaces of holomorphic functions. For all open sets U \subseteq X, \mathcal{O}(U) is the vector space of holomorphic functions on U. The restrictions are the usual restrictions of functions.

Proposition 4  If X is a Riemann surface, then \mathcal{O} is a sheaf on X.

Proof. As \mathcal{O} is a presheaf, it suffices to show properties (2.1) and (2.2)(2.1) follows directly from the definition of restriction of a function. If they agree on every set in the cover of U, they agree on all of U.

For (2.2) take some collection \{f_i \in \mathcal{O}(U_i)\}_{i \in I} such that f_i |_{U_i \cap U_j} = f_j |_{U_i \cap U_j} for all i, j \in I. For x \in U, f(x) = f_i(x) where i \in I such that x \in U. When \in U_i \cap U_jf_i |_{U_i \cap U_j} = f_j |_{U_i \cap U_j} by definition of the f_i. Therefore, f is well-defined. Given any x \in U, there exists some neighborhood U_i \in \mathcal{U} where f_i is holomorphic. From this follows that f is holomorphic, which means f \in \mathcal{O}(U).     ▢

Definition 5 (Direct limit of algebraic objects). Let \langle I, \leq \rangle be a directed set. Let \{A_i : i \in I\} be a family of objects indexed by I and f_{ij}: A_j \rightarrow A_j be a homomorphism for all i \leq j with the following properties:

  1. f_{ii} is the identity of A_i, and
  2. f_{ik} = f_{jk} \circ f_{ij} for all i \leq j \leq k.

Then the pair \langle A_i, f_{ij} \rangle is called a direct system over I.

The direct limit of the direct system \langle A_i, f_{ij} \rangle is denoted by \varinjlim A_i and is defined as follows. Its underlying set is the disjoint union of the A_is modulo a certain equivalence relation \sim:

\varinjlim A_i = \bigsqcup_i A_i \bigg / \sim.

Here, if x_i \in A_i and x_j \in A_j, then x_i \sim x_j iff there is some k \in I with i \leq k, j \leq k such that f_{ik}(x_i) = f_{jk}(x_j).

More concretely, using the sheaf of holomorphic functions on a Riemann surface, we see that here, the indices correspond to open sets with i \leq j meaning U \supset V, and f_{ij} : A_i \to A_j is the restriction \rho_V^U : \mathcal{F}(U) \to \mathcal{F}(V). Two holomorphic functions defined on U and V, represented by x_i and x_j are considered equivalent iff they are equal restricted to some W \subset V \cap U.

Fix a point x \in X and requires that the open sets in consideration are the neighborhoods of it. The direct limit in this case is called the stalk of F at x, denoted F_x. For each neighborhood U of x, the canonical morphism F(U) \to F_x associates to a section s of F over U an element s_x of the stalk F_x called the germ of s at x.

Dually, there is the inverse limit, which in our concrete context is the more abstract language for an analytic continuation.

Definition 6 (Inverse limit of algebraic objects). Let \langle I, \leq \rangle be a directed set. Let \{A_i : i \in I\} be a family of objects indexed by I and f_{ij}: A_j \rightarrow A_j be a homomorphism for all i \leq j with the following properties:

  1. f_{ii} is the identity of A_i, and
  2. f_{ik} = f_{jk} \circ f_{ij} for all i \leq j \leq k.

Then the pair ((A_i)_{i \in I}, (f_{ij})_{i \leq j \in I}) is an inverse system of groups and morphisms over I, and the morphism f_{ij} are called the transition morphisms of the system.

We define the inverse limit of the inverse system ((A_i)_{i \in I}, (f_{ij})_{i \leq j \in I}) as a particular subgroup of the direct product of the A_is:

A = \displaystyle\varprojlim_{i \in I} A_i = \left\{\left.\vec{a} \in \prod_{i \in I} A_i\; \right|\;a_i = f_{ij}(a_j) \text{ for all } i \leq j \text{ in } I\right\}.

What we have essentially are families of holomorphic functions over open sets, and we glue them together via a direct product indexed by open sets under the restriction there must be agreement in values at places where the open sets coincide. This gives us the space of holomorphic functions over the union of the open sets, which is of course a subgroup of the direct product under both addition and multiplication. We have here again the common theme of patching up local pieces to create a global structure.

Grassmannian manifold

We all know of real projective space \mathbb{R}P^n. It is in fact a special space of the Grassmannian manifold, which denoted G_{k,n}(\mathbb{R}), is the set of k-dimensional subspaces of \mathbb{R}^n. Such can be represented via the ranges of the k \times n matrices of rank k, k \leq n. On application of that operator we can apply any g \in GL(k, \mathbb{R}) and the range will stay the same. Partitioning by range, we introduce the equivalence relation \sim by \bar{A} \sim A if there exists g \in GL(k, \mathbb{R}) such that \bar{A} = gA. This Grassmannian can be identified with M_{k,n}(\mathbb{R}) / GL(k, \mathbb{R}).

Now we find the charts of it. There must be a minor k \times k with nonzero determinant. We can assume without loss of generality (as swapping columns changes not the range) that the first minor made of the first k columns is one of such, for the convenience of writing A = (A_1, \tilde{A_1}), where the \tilde{A_1} is k \times (n-k). We get

A_1^{-1}A = (I_k, A_1^{-1}\tilde{A_1}).

Thus the degrees of freedom are given by the k \times (n-k) matrix on the right, so k(n-k). If that submatrix is not the same between two full matrices reduced via inverting by minor, they cannot be the same as an application of any non identity element in GL(k, \mathbb{R}) would alter the identity matrix on the left.

I’ll leave it to the reader to run this on the real projective case, where k = 1, n = n+1.