Poor man's Daletskii-Krein

Consider a second-order, symmetric tensor $A$ whose components $\def\R{\mathbb{R}}{A^j}_i \in \R$. We know that such a tensor has an eigen decomposition \begin{equation} \label{eigendecomp} A = V \Lambda V^\top \end{equation} where $V$ is an orthonormal, second-order tensor and ${\Lambda^j}_i=\lambda^j \delta^j_i$ ($\lambda^i$ is the $i$th eigenvalue of $A$). One corollary of (\ref{eigendecomp}) is \begin{equation} \label{An} A^n = V \Lambda^n V^\top . \end{equation} A function $f:\R \to \R$ that is analytic at $x=0$ is equal to its Maclaurin series \begin{equation} \label{maclaurin} f(x) = \sum_{n=0}^\infty \frac{f^{(n)}(0)}{n!} x^n \end{equation} where $f^{(0)}(0)=f(0)$ and $f^{(n)}(0)=\eval{\dv[n]{f}{x}}_{x=0}$. If we define \begin{equation} \label{def-fA} f(A):= \sum_{n=0}^\infty \frac{f^{(n)}(0)}{n!} A^n \end{equation} then $f(A)$ is also a second-order, symmetric tensor. If we substitute (\ref{An}) in (\ref{def-fA}) we obtain $$ f(A) = V\qty(\sum_{n=0}^\infty \frac{f^{(n)}(0)}{n!} \Lambda^n) V^\top. $$ Since $\sum_{n=0}^\infty \frac{f^{(n)}(0)}{n!} \Lambda^n$ is a linear combination of diagonal matrices, it is also a diagonal matrix with elements $$ \sum_{n=0}^\infty \frac{f^{(n)}(0)}{n!} (\lambda^i)^n. $$ From (\ref{maclaurin}) this sum is equal to $f(\lambda^i)$ and so we can write \begin{equation} f(A) = V f(\Lambda) V^\top \end{equation} or, in terms of components, \begin{equation} f({A^j}_i)= {V^j}_a f(\lambda^a) {V_i}^a. \end{equation} The differential of $A^n$ has the property \begin{equation} \label{dAn} \dd (A^n) = \sum_{k=0}^{n-1} A^k (\dd A) A^{n-1-k} \end{equation} where $A^0=I$ the second-order tensor with the property $AI=IA=A$. If we substitute (\ref{An}) in (\ref{dAn}) we get \begin{equation}\label{dAn-no-comp} \dd (A^n) = \sum_{k=0}^{n-1} V \Lambda^k V^\top (\dd A) V \Lambda^{n-1-k} V^\top. \end{equation} In order to resolve it, it is better to write (\ref{dAn-no-comp}) in terms of the scalar components of the tensors: \begin{equation} \label{dAn-comp} \dd{{(A^n)}^j}_i = {V^j}_a \qty(\sum_{k=0}^{n-1} (\lambda^a)^k {V_q}^a (\dd {A^q}_p) {V^p}_b (\lambda^b)^{n-1-k}) {V_i}^b. \end{equation} Writing $(\lambda^a)^k$ means that this is a first-order tensor, whose components are obtained by raising to the $k$th power the components of the first-order tensor $\lambda$. This is the standard definition of an element-wise application of a function to a tensor. Since $\lambda^a,\lambda^b$ are scalars, \begin{equation} \label{Helements} (\lambda^b)^{n-1}\sum_{k=0}^{n-1}(\lambda^a)^k(\lambda^b)^{-k}=\left\{\begin{array}{cc} n (\lambda^a)^{n-1}, & a=b \\ \frac{(\lambda^a)^n-(\lambda^b)^n}{\lambda^a-\lambda^b}, & a \neq b\end{array}\right. \end{equation} (\ref{Helements}) determines the elements of a second-order tensor $H_{[n]}^{ab}$ (the subscript is in square brackets to make sure it is not confused with an index). Using (\ref{Helements}) we can rewrite (\ref{dAn-comp}) as \begin{equation} \label{dAn-Hn} \dd{(A^n)^j}_i= {V^j}_a \qty( H_{[n]}^{ab} {V_q}^a (\dd {A^q}_p) {V^p}_b) {V_i}^b. \end{equation} Since from (\ref{def-fA}) \begin{equation} \label{dfA} \dd f(A) = \sum_{n=0}^\infty \frac{f^{(n)}(0)}{n!} \dd (A^n) \end{equation} and $$ \eqalign{ f^{(1)}(\lambda^a) & =\sum_{n=0}^\infty \frac{f^{(n)}(0)}{n!} n (\lambda^a)^{n-1} \\ \frac{f(\lambda^a)-f(\lambda^b)}{\lambda^a-\lambda^b} & = \sum_{n=0}^\infty \frac{f^{(n)}(0)}{n!} \frac{(\lambda^a)^n-(\lambda^b)^n}{\lambda^a-\lambda^b} } $$ we can combine (\ref{dAn-comp}) and (\ref{dfA}) to obtain \begin{equation} \label{first-derivative} \dd f({A^j}_i) = {V^j}_a \qty(G^{ab} \qty({V_q}^a (\dd {A^q}_p) {V^p}_b)) {V_i}^b \end{equation} where $G$ is a second-order tensor whose elements are $$ {G^{ab}} = \left\{\begin{array}{cc} f^{(1)}(\lambda^a), & a=b \\ \frac{f(\lambda^a)-f(\lambda^b)}{\lambda^a-\lambda^b}, & a\neq b\end{array}\right. $$ In the expression $$ G^{ab} \qty({V_q}^a (\dd {A^q}_p) {V^p}_b) $$ we cannot sum w.r.t. to the dummy indices $a$, $b$ since we also have to consider this summation for tensors ${V^j}_a$, ${V_i}^b$. Since ${V_q}^a (\dd {A^q}_p) {V^p}_b$ has free indices $a,b$ we can consider the above product as a Hadamard product. So another way of writing (\ref{first-derivative}), in terms of matrix operations is $$ \dd f(A) = V \qty(G \odot \qty(V^\top(\dd A)V)) V^\top . $$ where $\odot$ denotes the Hadamard product. We can use (\ref{first-derivative}) to write $$ \frac{\partial f({A^j}_i)}{\partial {A^q}_p}= {V^j}_a {V^p}_b \qty({G^{ab}} {V_q}^a {V_i}^b). $$ Since this is a fourth-order tensor we cannot express it in terms of matrix operations. However, if we apply the vectorization operator to (\ref{first-derivative}) we can write the above expression in terms of matrix operations (for more details see Linton, O., and McCrorie, J. R. (1995), “Differentiation of an Exponential Matrix Function”, Econometric Theory, 11(5), 1182–1185.)

This is the poor man's derivation of the theorem from the seminal paper written by Yurii L'vovich Daletskii and Mark Grigor'evich Krein “Integration and differentiation of functions of hermitian operators and applications to the theory of perturbations”, first published in English in 1965 in AMS Translations, Series 2, Volume 47, pp. 1-30.

yuriidaletskii
Yurii L'vovich Daletskii

(Credit: https://ems.press/journals/mag/articles/16952)

To obtain the second-order derivative of $f(A)$ we must apply the differential operator to (\ref{dAn}): \begin{equation} \label{intermediate} {\dd}^2 (A^n) = \sum_{k=0}^{n-1}(\dd A^k)(\dd A)A^{n-1-k} + \sum_{k=0}^{n-1} A^k (\dd A)(\dd A^{n-1-k}). \end{equation} If we substitute the formula from (\ref{dAn}) for $\dd A^k$ and $\dd A^{n-1-k}$ we can rewrite (\ref{intermediate}) as \begin{equation} \label{long-eq} {\dd}^2 (A^n) = \sum_{k=0}^{n-1}\left(\sum_{l=0}^{k-1}A^l(\dd A)A^{k-1-l}\right)(\dd A)A^{n-1-k} \\ + \sum_{k=0}^{n-1} A^k (\dd A)\left(\sum_{l=0}^{n-2-k}A^l(\dd A)A^{n-2-k-l}\right). \end{equation} If we ignore the summations w.r.t. $k,l$ we first have a term $$ A^l(\dd A)A^{k-1-l}(\dd A) A^{n-1-k} $$ which using (\ref{eigendecomp}) can be written, in terms of tensor components, as $$ {V^j}_a(\lambda^a)^l{V_q}^a (\dd {A^q}_p) {V^p}_b(\lambda^b)^{k-1-l}{V_t}^b(\dd {A^t}_s){V^s}_c (\lambda^c)^{n-1-k}{V_i}^c. $$ Now we have a third-order tensor $K_{[n]}^{abc}$ whose elements are given by \begin{equation} K^{abc}_{[n]}=(\lambda^b)^{-1}(\lambda^c)^{n-1}\sum_{k=0}^{n-1}(\lambda^b)^{k}(\lambda^c)^{-k}\sum_{l=0}^{k-1} (\lambda^a)^l(\lambda^b)^{-l}. \end{equation} We can obtain simpler forms by looking at different cases \begin{align} K_{[n]}^{aaa} & =\frac{1}{2} n (n-1) (\lambda^a)^{n-2} \\ K_{[n]}^{aac} & =\frac{n\qty(\lambda^a-\lambda^c)(\lambda^a)^{n-1}-\qty((\lambda^a)^n-(\lambda^c)^n)}{\qty(\lambda^a-\lambda^c)^2} \\ K_{[n]}^{abb} & =\frac{n\qty(\lambda^b-\lambda^a)(\lambda^b)^{n-1}-\qty((\lambda^b)^n-(\lambda^a)^n)}{\qty(\lambda^b-\lambda^a)^2} \\ K_{[n]}^{cbc} & =\frac{n\qty(\lambda^c-\lambda^b)(\lambda^c)^{n-1}-\qty((\lambda^c)^n-(\lambda^b)^n)}{\qty(\lambda^c-\lambda^b)^2} \\ K_{[n]}^{abc} & = - \frac{(\lambda^b-\lambda^c)(\lambda^a)^{n}+(\lambda^c-\lambda^a)(\lambda^b)^{n}+(\lambda^a-\lambda^b)(\lambda^c)^{n}} {(\lambda^a-\lambda^b)(\lambda^b-\lambda^c)(\lambda^c-\lambda^a)} \end{align} If we apply $\sum_{n=0}^\infty \frac{f^{(n)}(0)}{n!}$ we convert tensor $K$ to another third-order tensor \begin{align} {J^{aaa}} & =\frac{1}{2} f^{(2)}(\lambda^a) \\ {J^{aac}} & =\frac{\qty(\lambda^a-\lambda^c)f^{(1)}(\lambda^a)-\qty(f(\lambda^a)-f(\lambda^c))}{\qty(\lambda^a-\lambda^c)^2} \\ {J^{abb}} & =\frac{\qty(\lambda^b-\lambda^a)f^{(1)}(\lambda^b)-\qty(f(\lambda^b)-f(\lambda^a))}{\qty(\lambda^b-\lambda^a)^2} \\ {J^{cbc}} & =\frac{\qty(\lambda^c-\lambda^b)f^{(1)}(\lambda^c)-\qty(f(\lambda^c)-f(\lambda^b))}{\qty(\lambda^c-\lambda^b)^2} \\ {J^{abc}} & = - \frac{(\lambda^b-\lambda^c)f(\lambda^a)+(\lambda^c-\lambda^a)f(\lambda^b)+(\lambda^a-\lambda^b)f(\lambda^c)}{(\lambda^a-\lambda^b)(\lambda^b-\lambda^c)(\lambda^c-\lambda^a)} \end{align} From (\ref{long-eq}) we also have a second term $$ A^k (\dd A)A^l(\dd A)A^{n-2-k-l} $$ which using (\ref{eigendecomp}) can be written, in terms of tensor components, as $$ {V^j}_a(\lambda^a)^k{V_q}^a (\dd {A^q}_p) {V^p}_b(\lambda^b)^{l}{V_t}^b(\dd {A^t}_s){V^s}_c (\lambda^c)^{n-2-k-l}{V_i}^c. $$ Now we have a third-order tensor $N^{abc}_{[n]}$ whose elements are given by \begin{equation} N_{[n]}^{abc}=(\lambda^c)^{n-2}\sum_{k=0}^{n-1}(\lambda^a)^{k}(\lambda^c)^{-k}\sum_{l=0}^{n-2-k} (\lambda^b)^l(\lambda^c)^{-l}. \end{equation} With a bit of algebra one can show that $N^{abc}_{[n]}=K^{abc}_{[n]}$. The end result, after applying the summation w.r.t. $n$, is $Q=2J$: \begin{align} {Q^{aaa}} & = f^{(2)}(\lambda^a) \\ {Q^{aac}} & =2\frac{\qty(\lambda^a-\lambda^c)f^{(1)}(\lambda^a)-\qty(f(\lambda^a)-f(\lambda^c))}{\qty(\lambda^a-\lambda^c)^2} \\ {Q^{abb}} & =2\frac{\qty(\lambda^b-\lambda^a)f^{(1)}(\lambda^b)-\qty(f(\lambda^b)-f(\lambda^a))}{\qty(\lambda^b-\lambda^a)^2} \\ {Q^{cbc}} & =2\frac{\qty(\lambda^c-\lambda^b)f^{(1)}(\lambda^c)-\qty(f(\lambda^c)-f(\lambda^b))}{\qty(\lambda^c-\lambda^b)^2} \\ {Q^{abc}} & = - 2\frac{(\lambda^b-\lambda^c)f(\lambda^a)+(\lambda^c-\lambda^a)f(\lambda^b)+(\lambda^a-\lambda^b)f(\lambda^c)}{(\lambda^a-\lambda^b)(\lambda^b-\lambda^c)(\lambda^c-\lambda^a)} \end{align} Having determined this tensor we can write \begin{equation} {\dd}^2 f({A^j}_i) = {V^j}_a \qty( {Q^{abc}} \qty({V_q}^a (\dd {A^q}_p) {V^p}_b) \qty({V_t}^b(\dd {A^t}_s){V^s}_c ) ){V_i}^c. \end{equation} Therefore the second derivative is \begin{equation} \frac{\partial^2f({A^j}_i)}{\partial{A^q}_p\partial{A^t}_s} = {V^j}_a{V^p}_b{V^s}_c \qty({Q^{abc}}{V_q}^a{V_t}^b{V_i}^c). \end{equation} The Taylor expansion of $f({A^j}_i)$ can be written as \begin{equation} f({A^j}_i+\Delta{A^j}_i)= f({A^j}_i)+\frac{\partial f({A^j}_i)}{\partial {A^q}_p}\Delta{A^q}_p+\frac{1}{2}\frac{\partial^2f({A^j}_i)}{\partial{A^q}_p\partial{A^t}_s} \Delta{A^q}_p\Delta{A^t}_s + \cdots \end{equation} where $\Delta{A^j}_i$ is some perturbation applied to ${A^j}_i$.

Comments

Popular posts from this blog

CSAT 1

PAT 2022