Implicit function theorem and its multivariate generalization

The implicit function theorem for a single output variable can be stated as follows:

Single equation implicit function theorem. Let F(\mathbf{x}, y) be a function of class C^1 on some neighborhood of a point (\mathbf{a}, b) \in \mathbb{R}^{n+1}. Suppose that F(\mathbf{a}, b) = 0 and \partial_y F(\mathbf{a}, b) \neq 0. Then there exist positive numbers r_0, r_1 such that the following conclusions are valid.

a. For each \mathbf{x} in the ball |\mathbf{x} - \mathbf{a}| < r_0 there is a unique y such that |y - b| < r_1 and F(\mathbf{x}, y) = 0. We denote this y by f(\mathbf{x}); in particular, f(\mathbf{a}) = b.

b. The function f thus defined for |\mathbf{x} - \mathbf{a}| < r_0 is of class C^1, and its partial derivatives are given by

\partial_j f(\mathbf{x}) = -\frac{\partial_j F(\mathbf{x}, f(\mathbf{x}))}{\partial_y F(\mathbf{x}, f(\mathbf{x}))}.

Proof. For part (a), assume without loss of generality positive \partial_y F(\mathbf{a}, b). By continuity of that partial derivative, we have that in some neighborhood of (\mathbf{a}, b) it is positive and thus for some r_1 > 0, r_0 > 0 there exists f such that |\mathbf{x} - \mathbf{a}| < r_0 implies that there exists a unique y (by intermediate value theorem along with positivity of \partial_y F) such that |y - b| < r_1 with F(\mathbf{x}, y) = 0, which defines some function y = f(\mathbf{x}).

To show that f has partial derivatives, we must first show that it is continuous. To do so, we can let r_1 be our \epsilon and use the same process to arrive at our \delta, which corresponds to r_0.

For part (b), to show that its partial derivatives exist and are equal to what we desire, we perturb \mathbf{x} with an \mathbf{h} that we let WLOG be

\mathbf{h} = (h, 0, \ldots, 0).

Then with k = f(\mathbf{x}+\mathbf{h}) - f(\mathbf{x}), we have F(\mathbf{x} + \mathbf{h}, y+k) = F(\mathbf{x}, y) = 0. From the mean value theorem, we can arrive at

0 = h\partial_1F(\mathbf{x}+t\mathbf{h}, y + tk) + k\partial_y F(\mathbf{x}+t\mathbf{h}, y+tk)

for some t \in (0,1). Rearranging and taking h \to 0 gives us

\partial_j f(\mathbf{x}) = -\frac{\partial_j F(\mathbf{x}, y)}{\partial_y F(\mathbf{x}, y)}.

The following can be generalized to multiple variables, with k implicit functions and k constraints.     ▢

Implicit function theorem for systems of equations. Let \mathbf{F}(\mathbf{x}, \mathbf{y}) be an \mathbb{R}^k valued functions of class C^1 on some neighborhood of a point \mathbf{F}(\mathbf{a}, \mathbf{b}) \in \mathbb{R}^{n+k} and let B_{ij} = (\partial F_i / \partial y_j)(\mathbf{a}, \mathbf{b}). Suppose that \mathbf{F}(\mathbf{x}, \mathbf{y}) = \mathbf{0} and \det B \neq 0. Then there exist positive numbers r_0, r_1 such that the following conclusions are valid.

a. For each \mathbf{x} in the ball |\mathbf{x} - \mathbf{a}| < r_0 there is a unique \mathbf{y} such that |\mathbf{y} - \mathbf{b}| < r_1 and \mathbf{F}(\mathbf{x}, \mathbf{y}) = 0. We denote this \mathbf{y} by \mathbf{f}(\mathbf{x}); in particular, \mathbf{f}(\mathbf{a}) = \mathbf{b}.

b. The function \mathbf{f} thus defined for |\mathbf{x} - \mathbf{a}| < r_0 is of class C^1, and its partial derivatives \partial_j \mathbf{f} can be computed by differentiating the equations \mathbf{F}(\mathbf{x}, \mathbf{f}(\mathbf{x})) = \mathbf{0} with respect to x_j and solving the resulting linear system of equations for \partial_j f_1, \ldots, \partial_j f_k.

Proof: For this we will be using Cramer’s rule, which is that one can solve a linear system Ax = y (provided of course that A is non-singular) by taking matrix obtained from substituting the kth column of A with y and letting x_k be the determinant of that matrix divided by the determinant of A.

From this, we are somewhat hinted that induction is in order. If B is invertible, then one of its k-1 \times k-1 submatrices is invertible. Assume WLOG that such applies to the one determined by B^{kk}. With this in mind, we can via our inductive hypothesis have

F_1(\mathbf{x}, \mathbf{y}) = F_2(\mathbf{x}, \mathbf{y}) = \cdots = F_{k-1}(\mathbf{x}, \mathbf{y}) = 0

determine y_j = g_j(\mathbf{x}, y_k) for j = 1,2,\ldots,k-1. Here we are making y_k an independent variable and we can totally do that because we are inducting on the number of outputs (and also constraints). Substituting this into the F_k constraint, this reduces to the single variable case, with

G(\mathbf{x}, y_k) = F_k(\mathbf{x}, \mathbf{g}(\mathbf{x}, y_k), y_k) = 0.

It suffices now to show via our \det B \neq 0 hypothesis that \frac{\partial G}{\partial y_k} \neq 0. Routine application of the chain rule gives

\frac{\partial G}{\partial y_k} = \displaystyle\sum_{j=1}^{k-1} \frac{\partial F_k}{\partial y_j} \frac{\partial g_j}{\partial y_k} + \frac{\partial F_k}{\partial y_k} = \displaystyle\sum_{j=1}^{k-1} B^{kj} \frac{\partial g_j}{\partial y_k} + B^{kk}. \ \ \ \ (1)

The \frac{\partial g_j}{\partial y_k}s are the solution to the following linear system:

\begin{pmatrix} \frac{\partial F_1}{\partial y_1}  & \dots & \frac{\partial F_1}{\partial y_{k-1}} \\ \; & \ddots \; \\ \frac{\partial F_{k-1}}{\partial y_1} & \dots & \frac{\partial F_{k-1}}{\partial y_{k-1}} \end{pmatrix} \begin{pmatrix} \frac{\partial g_1}{\partial y_k} \\ \vdots \\ \frac{\partial g_{k-1}}{\partial y_k} \end{pmatrix} = \begin{pmatrix} \frac{-\partial F_1}{\partial y_k} \\ \vdots \\ \frac{-\partial F_{k-1}}{\partial y_k} \end{pmatrix} .

Let M^{ij} denote the k-1 \times k-1 submatrix induced by B_{ij}. We see then that in the replacement for Cramer’s rule, we arrive at what is M^{kj} but with the last column swapped to the left k-j-1 times such that it lands in the jth column and also with a negative sign, which means

\frac{\partial g_j}{\partial y_k}(\mathbf{a}, b_k) = (-1)^{k-j} \frac{\det M^{jk}}{\det M^{kk}}.

Now, we substitute this into (1) to get

\begin{aligned}\frac{\partial G}{\partial y_k}(\mathbf{a}, b_k) &= \displaystyle_{j=1}^{k-1} (-1)^{k-j}B_{kj}\frac{\det M^{kj}}{\det M^{kk}} + B_kk \\ &= \frac{\sum_{j=1}^k (-1)^{j+k} B_{kj}\det M^{kj}}{\det M^{kk}} \\ &= \frac{\det B}{\det M^{kk}} \\ &\neq 0. \end{aligned}

Finally, we apply the implicit function theorem for one variable for the y_k that remains.     ▢

References

  • Gerald B. Folland, Advanced Calculus, Prentice Hall, Upper Saddle River, NJ, 2002, pp. 114–116, 420–422.