求导定义与求导布局

矩阵向量求导引入

在高等数学里面,我们已经学过了标量对标量的求导,比如标量yy对标量xx的求导,可以表示为yx\frac{\partial y}{\partial x}

如果我们把这组标量写成向量的形式,即得到维度为mm的一个向量y\mathbf{y}对一个标量xx的求导,那么结果也是一个mm维的向量:

y/x\partial \mathbf{y} / \partial x

可见,所谓向量对标量的求导,其实就是向量里的每个分量分别对标量求导,最后把求导的结果排列在一起,按一个向量表示而已。类似的结论也存在于标量对向量的求导,向量对向量的求导,向量对矩阵的求导,矩阵对向量的求导,以及矩阵对矩阵的求导等。

为了便于描述,后面如果没有指明,则求导的自变量用xx表示标量,x\mathbf{x}表示nn维向量,X\mathbf{X}表示m×nm \times n维度的矩阵,求导的因变量用yy表示标量,y\mathbf{y}表示mm维向量,Y\mathbf{Y}表示p×qp \times q维度的矩阵。

  • xx:标量
  • x\mathbf{x}nn维列向量
  • y\mathbf{y}:$m $维列向量
  • XXm×nm \times n矩阵
  • YYp×qp \times q矩阵

可见,对于分子布局和分母布局的结果来说,两者相差一个转置

有了布局的概念,我们对于上面5种求导类型,可以各选择一种布局来求导。但是对于某一种求导类型,不能同时使用分子布局和分母布局求导。

自变量\因变量 标量𝑦 列向量y(m)\mathbf{y}(m) 矩阵Y(pq)\mathbf{Y}(p*q)
标量xx / y/x\partial \mathbf{y} / \partial x
分子布局:mm维列向量(默认布局)
分母布局:mm维行向量
Y/x\partial \mathbf{Y} / \partial x
分子布局:pqp*q矩阵(默认布局)
分母布局:qpq*p矩阵
列向量x(n)\mathbf{x}(n) y/x\partial y / \partial \mathbf{x}
分子布局:nn维行向量
分母布局:nn维列向量(默认布局)
y/x\partial \mathbf{y} / \partial \mathbf{x}
分子布局:mnm*n雅克比矩阵(默认布局)
分母布局:nmn*m梯度矩阵
/
矩阵X(mn)\mathbf{X}(m*n) y/X\partial y / \partial \mathbf{X}
分子布局:nmn*m矩阵
分母布局:mnm*n矩阵(默认布局)
/ /

矩阵向量求导大全

自变量\因变量 标量yy 向量y\mathbf{y} 矩阵Y\mathbf{Y}
标量xx yx\frac{\partial y}{\partial x}
大学微积分知识
yx\frac{\partial \mathbf{y}}{\partial x}
定义法求导
Yx\frac{\partial \mathbf{Y}}{\partial x}
定义法求导
向量x\mathbf{x} yx\frac{\partial y}{\partial \mathbf{x}}
1. 定义法求导
2. 基本法则:线性法则、乘法法则、除法法则
3. 矩阵微分:df=tr((fx)Tdx)df = tr\left( \left( \frac{\partial f}{\partial \mathbf{x}} \right)^{T} d\mathbf{x} \right )
4. 链式法则:zx=(yx)Tzy\frac{\partial z}{\partial \mathbf{x}} = \left( \frac{\partial \mathbf{y}}{\partial \mathbf{x}}\right)^{T} \frac{\partial z}{\partial \mathbf{y}}
yx\frac{\partial \mathbf{y}}{\partial \mathbf{x}}
1. 定义法求导
2. 链式法则:zx=zyyx\frac{\partial \mathbf{z}}{\partial \mathbf{x}} = \frac{\partial \mathbf{z}}{\partial \mathbf{y}} \frac{\partial \mathbf{y}}{\partial \mathbf{x}}
——
矩阵X\mathbf{X} yX\frac{\partial y}{\partial \mathbf{X}}
1. 定义法求导
2. 矩阵微分:df=tr((fX)TdX)df = tr\left( \left( \frac{\partial f}{\partial \mathbf{X}} \right)^{T} d\mathbf{X} \right )
3. 矩阵微分性质
4. 迹技巧
5. 链式求导法则:zxij=k,lzYklYklXij=tr((zY)TYXij)\frac{\partial z}{\partial x_{ij}} = \sum_{k,l} \frac{\partial z}{\partial \mathbf{Y}_{kl}} \frac{\partial \mathbf{Y}_{kl}}{\partial \mathbf{X}_{ij}} = tr \left( \left( \frac{\partial z}{\partial \mathbf{Y}} \right)^{T} \frac{\partial \mathbf{Y}}{\partial \mathbf{X}_{ij}} \right)
—— YX\frac{\partial \mathbf{Y}}{\partial \mathbf{X}}
1. 定义:YX=vec(Y)vec(X)\frac{\partial \mathbf{Y}}{\partial \mathbf{X}} = \frac{\partial vec \left(\mathbf{Y} \right)}{\partial vec \left(\mathbf{X} \right)}
2. 微分法:vec(dY)=YXTvec(dX)vec(d \mathbf{Y}) = \frac{\partial \mathbf{Y}}{\partial \mathbf{X}}^{T}vec \left(d \mathbf{X} \right)
3. 运算法则

矩阵向量求导法则

行向量对元素求导

yT=[y1yn]\mathbf{y}^{T}=\left[\begin{array}{lll}y_{1} & \cdots & y_{n}\end{array}\right]nn维行向量,xx是元素,则yTx=[y1xynx]\frac{\partial \mathbf{y}^{T}}{\partial x}=\left[\begin{array}{lll}\frac{\partial y_{1}}{\partial x} & \cdots & \frac{\partial y_{n}}{\partial x}\end{array}\right]

列向量对元素求导

y=[y1ym]\mathbf{y}=\left[\begin{array}{c}y_{1} \\ \vdots \\ y_{m}\end{array}\right]mm维列向量,xx是元素,则yx=[y1xymx]\frac{\partial \mathbf{y}}{\partial x}=\left[\begin{array}{c}\frac{\partial y_{1}}{\partial x} \\ \vdots \\ \frac{\partial y_{m}}{\partial x}\end{array}\right]

矩阵对元素求导

Y=[y11y1nym1ymn]Y=\left[\begin{array}{ccc}y_{11} & \cdots & y_{1 n} \\ \vdots & & \vdots \\ y_{m 1} & \cdots & y_{m n}\end{array}\right]m×nm \times n矩阵,xx是元素,则Yx=[y11xy1nxym1xymnx]\frac{\partial Y}{\partial x}=\left[\begin{array}{ccc} \frac{\partial y_{11}}{\partial x} & \cdots & \frac{\partial y_{1 n}}{\partial x} \\ \vdots & & \\ \frac{\partial y_{m 1}}{\partial x} & \cdots & \frac{\partial y_{m n}}{\partial x} \end{array}\right]

元素对行向量求导

yy是元素,xT=[x1xq]\mathbf{x}^{T}=\left[\begin{array}{lll}x_{1} & \cdots & x_{q}\end{array}\right]qq维行向量,则yxT=[yx1yxq]\frac{\partial y}{\partial \mathbf{x}^{T}}=\left[\begin{array}{lll}\frac{\partial y}{\partial x_{1}} & \cdots & \frac{\partial y}{\partial x_{q}}\end{array}\right]

元素对列向量求导

yy是元素,x=[x1xp]\mathbf{x}=\left[\begin{array}{c}x_{1} \\ \vdots \\ x_{p}\end{array}\right]pp维列向量,则yx=[yx1yxp]\frac{\partial y}{\partial \mathbf{x}}=\left[\begin{array}{c}\frac{\partial y}{\partial x_{1}} \\ \vdots \\ \frac{\partial y}{\partial x_{p}}\end{array}\right]

元素对矩阵求导

yy是元素,X=[x11x1qxp1ypq]\mathbf{X}=\left[\begin{array}{ccc}x_{11} & \cdots & x_{1 q} \\ \vdots & & \vdots \\ x_{p 1} & \cdots & y_{p q}\end{array}\right]p×qp \times q矩阵,则yX=[yX11yx1qyxp1yxpq]\frac{\partial y}{\partial X}=\left[\begin{array}{ccc} \frac{\partial y}{\partial \mathbf{X}_{11}} & \cdots & \frac{\partial y}{\partial x_{1 q}} \\ \vdots & & \\ \frac{\partial y}{\partial x_{p 1}} & \cdots & \frac{\partial y}{\partial x_{p q}} \end{array}\right]

行向量对列向量求导

yT=[y1yn]\mathbf{y}^{T}=\left[\begin{array}{lll}y_{1} & \cdots & y_{n}\end{array}\right]nn维行向量,x=[x1xp]\mathbf{x}=\left[\begin{array}{c}x_{1} \\ \vdots \\ x_{p}\end{array}\right]pp维列向量,则yTx=[y1x1ynx1y1xpynxp]\frac{\partial \mathbf{y}^{T}}{\partial \mathbf{x}}=\left[\begin{array}{ccc}\frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{n}}{\partial x_{1}} \\ \vdots & & \\ \frac{\partial y_{1}}{\partial x_{p}} & \cdots & \frac{\partial y_{n}}{\partial x_{p}}\end{array}\right]

列向量对行向量求导

y=[y1ym]\mathbf{y}=\left[\begin{array}{c}y_{1} \\ \vdots \\ y_{m}\end{array}\right]mm维列向量,xT=[x1xq]\mathbf{x}^{T}=\left[\begin{array}{lll}x_{1} & \cdots & x_{q}\end{array}\right]qq维行向量,则yxT=[y1x1y1xqymx1ymxq]\frac{\partial \mathbf{y}}{\partial \mathbf{x}^{T}}=\left[\begin{array}{ccc} \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{1}}{\partial x_{q}} \\ \vdots & & \\ \frac{\partial y_{m}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{q}} \end{array}\right]

行向量对行向量求导

yT=[y1yn]\mathbf{y}^{T}=\left[\begin{array}{lll}y_{1} & \cdots & y_{n}\end{array}\right]nn维行向量,xT=[x1xq]\mathbf{x}^{T}=\left[\begin{array}{lll}x_{1} & \cdots & x_{q}\end{array}\right]qq维行向量,则yTxT=[yTx1yTxq]\frac{\partial \mathbf{y}^{T}}{\partial \mathbf{x}^{T}}=\left[\begin{array}{lll} \frac{\partial \mathbf{y}^{T}}{\partial x_{1}} & \cdots & \frac{\partial \mathbf{y}^{T}}{\partial x_{q}} \end{array}\right]

列向量对列向量求导

y=[y1ym]\mathbf{y}=\left[\begin{array}{c}y_{1} \\\vdots \\y_{m}\end{array}\right]mm维列向量,x=[x1xp]\mathbf{x}=\left[\begin{array}{c}x_{1} \\\vdots \\x_{p}\end{array}\right]pp维列向量,则yx=[y1xymx]\frac{\partial \mathbf{y}}{\partial \mathbf{x}}=\left[\begin{array}{c} \frac{\partial y_{1}}{\partial \mathbf{x}} \\ \vdots \\ \frac{\partial y_{m}}{\partial \mathbf{x}} \end{array}\right]

矩阵对行向量求导

Y=[y11y1nym1ymn]\mathbf{Y}=\left[\begin{array}{ccc}y_{11} & \cdots & y_{1 n} \\ \vdots & & \vdots \\ y_{m 1} & \cdots & y_{m n}\end{array}\right]m×nm \times n矩阵,xT=[x1xq]\mathbf{x}^{T}=\left[\begin{array}{lll}x_{1} & \cdots & x_{q}\end{array}\right]qq维行向量,则YxT=[Yx1Yxq]\frac{\partial \mathbf{Y}}{\partial \mathbf{x}^{T}}=\left[\begin{array}{lll} \frac{\partial \mathbf{Y}}{\partial x_{1}} & \cdots & \frac{\partial \mathbf{Y}}{\partial x_{q}} \end{array}\right]

矩阵对列向量求导

Y=[y11y1nym1ymn]\mathbf{Y}=\left[\begin{array}{ccc}y_{11} & \cdots & y_{1 n} \\ \vdots & & \vdots \\ y_{m 1} & \cdots & y_{m n}\end{array}\right]m×nm \times n矩阵,x=[x1xp]\mathbf{x}=\left[\begin{array}{c}x_{1} \\ \vdots \\ x_{p}\end{array}\right]pp维列向量,则Yx=[y11xy1nxym1xymnx]\frac{\partial \mathbf{Y}}{\partial \mathbf{x}}=\left[\begin{array}{ccc} \frac{\partial y_{11}}{\partial \mathbf{x}} & \cdots & \frac{\partial y_{1 n}}{\partial \mathbf{x}} \\ \vdots & & \vdots \\ \frac{\partial y_{m 1}}{\partial \mathbf{x}} & \cdots & \frac{\partial y_{m n}}{\partial \mathbf{x}} \end{array}\right]

行向量对矩阵求导

yT=[y1yn]\mathbf{y}^{T}=\left[\begin{array}{lll}y_{1} & \cdots & y_{n}\end{array}\right]nn维行向量,X=[x11x1qxp1ypq]\mathbf{X}=\left[\begin{array}{ccc}x_{11} & \cdots & x_{1 q} \\ \vdots & & \vdots \\ x_{p 1} & \cdots & y_{p q}\end{array}\right]p×qp \times q矩阵,则

yTX=[yTx11yTx1qyTxp1yTxpq]\frac{\partial \mathbf{y}^{T}}{\partial \mathbf{X}}=\left[\begin{array}{ccc} \frac{\partial \mathbf{y}^{T}}{\partial x_{11}} & \cdots & \frac{\partial \mathbf{y}^{T}}{\partial x_{1 q}} \\ \vdots & & \\ \frac{\partial \mathbf{y}^{T}}{\partial x_{p 1}} & \cdots & \frac{\partial \mathbf{y}^{T}}{\partial x_{p q}} \end{array}\right]

列向量对矩阵求导

y=[y1ym]\mathbf{y}=\left[\begin{array}{c}y_{1} \\ \vdots \\ y_{m}\end{array}\right]mm维列向量,X=[x11x1qxp1ypq]\mathbf{X}=\left[\begin{array}{ccc}x_{11} & \cdots & x_{1 q} \\ \vdots & & \vdots \\ x_{p 1} & \cdots & y_{p q}\end{array}\right]p×qp \times q矩阵,则yX=[y1XymX]\frac{\partial \mathbf{y}}{\partial \mathbf{X}}=\left[\begin{array}{c}\frac{\partial y_{1}}{\partial \mathbf{X}} \\ \vdots \\ \frac{\partial y_{m}}{\partial \mathbf{X}}\end{array}\right]

矩阵对矩阵求导

Y=[y11y1nym1ymn]=[y1TymT]\mathbf{Y}=\left[\begin{array}{ccc}y_{11} & \cdots & y_{1 n} \\ \vdots & & \vdots \\ y_{m 1} & \cdots & y_{m n}\end{array}\right]=\left[\begin{array}{c}\mathbf{y}_{1}^{T} \\ \vdots \\ \mathbf{y}_{m}^{T}\end{array}\right]m×nm \times n矩阵,X=[x11x1qxp1ypq]=[x1xq]\mathbf{X}=\left[\begin{array}{ccc}x_{11} & \cdots & x_{1 q} \\ \vdots & & \vdots \\ x_{p 1} & \cdots & y_{p q}\end{array}\right] =\left[\begin{array}{lll}\mathbf{x}_{1} & \cdots & \mathbf{x}_{q}\end{array}\right]p×qp \times q矩阵,则YX=[Yx1Yxq]=[y1TXymTX]=[y1Tx1y1TxqymTx1ymTxq]\frac{\partial \mathbf{Y}}{\partial \mathbf{X}}=\left[\begin{array}{lll} \frac{\partial \mathbf{Y}}{\partial \mathbf{x}_{1}} & \cdots & \frac{\partial \mathbf{Y}}{\partial \mathbf{x}_{q}} \end{array}\right]=\left[\begin{array}{c} \frac{\partial \mathbf{y}_{1}^{T}}{\partial \mathbf{X}} \\ \vdots \\ \frac{\partial \mathbf{y}_{m}^{T}}{\partial \mathbf{X}} \end{array}\right]=\left[\begin{array}{ccc} \frac{\partial \mathbf{y}_{1}^{T}}{\partial \mathbf{x}_{1}} & \cdots & \frac{\partial \mathbf{y}_{1}^{T}}{\partial \mathbf{x}_{q}} \\ \vdots & & \vdots \\ \frac{\partial \mathbf{y}_{m}^{T}}{\partial \mathbf{x}_{1}} & \cdots & \frac{\partial \mathbf{y}_{m}^{T}}{\partial \mathbf{x}_{q}} \end{array}\right]

例题

AX=[2xyy2yx22xyx]\frac{\partial A}{\partial X}=\left[\begin{array}{ccc}2 x y & y^{2} & y \\ x^{2} & 2 x y & x\end{array}\right]X=[xy]X=\left[\begin{array}{l}x \\ y\end{array}\right],根据 (12) 矩阵对列向量求导法则,有

2AX2=[(2xy)X(y2)XyX(x2)X(2xy)XxX]=[2y002x2y12x2y102x0]\frac{\partial^{2} A}{\partial X^{2}}=\left[\begin{array}{lll} \frac{\partial(2 x y)}{\partial X} & \frac{\partial\left(y^{2}\right)}{\partial X} & \frac{\partial y}{\partial X} \\ \frac{\partial\left(x^{2}\right)}{\partial X} & \frac{\partial(2 x y)}{\partial X} & \frac{\partial x}{\partial X} \end{array}\right]=\left[\begin{array}{ccc} 2 y & 0 & 0 \\ 2 x & 2 y & 1 \\ 2 x & 2 y & 1 \\ 0 & 2 x & 0 \end{array}\right]