数值线性代数: Krylov子空间法

本文旨在总结线性方程组求解的相关算法,特别是Krylov子空间法的原理及流程。

注1:限于研究水平,分析难免不当,欢迎批评指正。

注2:文章内容会不定期更新。

零、预修

0.1 共轭转置

对于\boldsymbol{A}\in \mathbb{C}^{n\times n}\boldsymbol{B}\in \mathbb{C}^{n\times n},若矩阵\boldsymbol{A}i行第j列元素a_{i,j}的共轭等于矩阵\boldsymbol{B}j行第i列元素b_{j,i},即b_{j,i}=\bar{a}_{i,j},则称矩阵\boldsymbol{B}是矩阵\boldsymbol{A}的共轭转置矩阵,记作\boldsymbol{B}=\boldsymbol{A}^{H}

可以看出,若\boldsymbol{A}\in \mathbb{R}^{n\times n},则\boldsymbol{A}^{H}=\boldsymbol{A}^{T}

0.2 Hermite矩阵

对于\boldsymbol{A}\in \mathbb{C}^{n\times n},若\boldsymbol{A}^{H}=\boldsymbol{A},则称矩阵\boldsymbol{A}为Hermite矩阵。

可以看出,若\boldsymbol{A}\in \mathbb{R}^{n\times n},则\boldsymbol{A}^{H}=\boldsymbol{A}^{T}=\boldsymbol{A},即实数域的Hermite矩阵即是对称矩阵。

0.3 Hessenberg矩阵

\boldsymbol{H}=\begin{pmatrix} h_{1,1} & h_{1,2} & \cdots & h_{1,k}\\ h _{2,1} & h_{2,2} & \cdots &h_{2,k} \\ 0& \ddots& \ddots &\vdots \\ 0& 0 & h_{k,k-1}& h _{k,k} \end{pmatrix}

0.4 Cholesky分解

对于正定矩阵\boldsymbol{A}\in \mathbb{C}^{n\times n},则有\boldsymbol{A}=\boldsymbol{L}\boldsymbol{L}^{H},其中,矩阵\boldsymbol{L}\in \mathbb{C}^{n\times n}是下三角矩阵,矩阵\boldsymbol{L}^{H}是矩阵\boldsymbol{L}的共轭转置。

若对称正定矩阵\boldsymbol{A}\in \mathbb{R}^{n\times n},则有\boldsymbol{A}=\boldsymbol{L}\boldsymbol{L}^{T},其中,矩阵\boldsymbol{L}\in \mathbb{R}^{n\times n}是下三角矩阵,矩阵\boldsymbol{L}^{T}是矩阵\boldsymbol{L}的转置。

0.5 Arnoldi分解

\boldsymbol{A}\in \mathbb{C}^{n\times n},则有\boldsymbol{A}\boldsymbol{V}=\boldsymbol{V}\boldsymbol{H}+\boldsymbol{f}\boldsymbol{e}_{m}^{T},其中\boldsymbol{V}=\left ( \boldsymbol{v}_{1}, \boldsymbol{v}_{2},\cdots ,\boldsymbol{v}_{m} \right )\in \mathbb{C}^{n\times m }\boldsymbol{V}^{H}\boldsymbol{V}=\mathbf{I}\boldsymbol{H}\in \mathbb{C}^{m\times m }\boldsymbol{f}\in \mathbb{C}^{n }\boldsymbol{V}^{H} \boldsymbol{f}=\boldsymbol{0}\boldsymbol{e}_{m}^{T}=\left ( 0,0,\cdots,0,1 \right )\in \mathbb{C}^{m}


下面如无特殊说明,均仅讨论实数域内的线性方程组。

一、直接法求解线性方程组

1.1 LU分解

\boldsymbol{A}\in \mathbb{R}^{n\times n},若对于k\in \left [ 1,n \right ],均有\left | \boldsymbol{A}\left ( 1:k,1:k \right ) \right |\neq 0,则存在唯一的单位下三角矩阵\boldsymbol{L} \in\mathbb{R}^{n\times n}和上三角矩阵\boldsymbol{U} \in\mathbb{R}^{n\times n},使得\boldsymbol{A}=\boldsymbol{L}\boldsymbol{U},并且\left |A \right |=U\left ( 1,1 \right )U\left ( 2,2 \right )\cdots U\left ( n,n \right )

1.2 Cholesky分解

\boldsymbol{A}\in \mathbb{R}^{n\times n}对称正定,则有\boldsymbol{A}=\boldsymbol{L}\boldsymbol{D}\boldsymbol{L}^{T},其中,\boldsymbol{L} \in\mathbb{R}^{n\times n}为单位下三角矩阵,\boldsymbol{D} \in\mathbb{R}^{n\times n}为对角元均为正数的对角矩阵。

二、 总论:迭代法求解线性方程组的一般思路

对于非奇异矩阵\boldsymbol{A}\in \mathbb{R}^{n\times n}\boldsymbol{b}\in \mathbb{R}^{n},使用迭代法求解线性方程组\boldsymbol{A}\boldsymbol{x}=\boldsymbol{b}过程中,一般需要以下流程进行:

  1. 给定一个初始向量\boldsymbol{x}_{0}
  2. 构造一个递推公式\boldsymbol{x}_{k+1}=\boldsymbol{f}\left ( \boldsymbol{x}_{k},\boldsymbol{A},\mathbf{b} \right )
  3. 不断递推\boldsymbol{x}_{k+1},使其近似收敛于\boldsymbol{x}_{*}

下表列出了若干迭代算法的迭代公式。

方法\boldsymbol{A}迭代公式备注
Jacobi迭代非奇异\boldsymbol{A}=\boldsymbol{D}-\boldsymbol{L}-\boldsymbol{U} \\ \boldsymbol{x}_{k}=\boldsymbol{D}^{-1}\left ( \boldsymbol{L}+\boldsymbol{U} \right ) \boldsymbol{x}_{k-1}+\boldsymbol{D}^{-1}\boldsymbol{b}
Gausss-Seidel迭代非奇异\boldsymbol{A}=\boldsymbol{D}-\boldsymbol{L}-\boldsymbol{U} \\ \boldsymbol{x}_{k}=\left ( \boldsymbol{D}-\boldsymbol{L }\right )^{-1}\boldsymbol{U}\boldsymbol{x}_{k-1}+\left ( \boldsymbol{D}-\boldsymbol{L} \right )^{-1}b
SOR迭代非奇异\boldsymbol{A}=\boldsymbol{D}-\boldsymbol{L}-\boldsymbol{U} \\ \boldsymbol{L}_{\omega }=\left ( \boldsymbol{D}-\omega \boldsymbol{L}\right )^{-1} \left ( \left ( 1-\omega \right )\boldsymbol{D}+\omega \boldsymbol{U} \right )\\ \boldsymbol{x}_{k+1}= \boldsymbol{L}_{\omega }\boldsymbol{x}_{k}+\omega \left ( \boldsymbol{D}-\omega \boldsymbol{L} \right )^{-1}\boldsymbol{b}
Steepest Descent对称正定\boldsymbol{r}_{k}=\boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_{k}\\ \boldsymbol{p}_{k}=\boldsymbol{r}_{k}\\ \alpha_{k}=\frac{\boldsymbol{r}_{k}^{T}\boldsymbol{p}_{k}}{\boldsymbol{p}_{k}^{T}\boldsymbol{A}\boldsymbol{p}_{k}}\\ \boldsymbol{x}_{k+1}=\boldsymbol{x}_{k}+\alpha _{k}\boldsymbol{p}_{k}
Conjugate Gradient对称正定

k=1

     \boldsymbol{r}_{k}=\boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_{k}\\ \boldsymbol{p}_{k}=\boldsymbol{r}_{k}\\ \alpha_{k}=\frac{\boldsymbol{r}_{k}^{T}\boldsymbol{p}_{k}}{\boldsymbol{p}_{k}^{T}\boldsymbol{A}\boldsymbol{p}_{k}}\\ \boldsymbol{x}_{k+1}=\boldsymbol{x}_{k}+\alpha _{k}\boldsymbol{p}_{k}

当1" src="https://latex.csdn.net/eq?k%3E1" />时

    \alpha _{k}=\frac{\boldsymbol{r}_{k}^{T}\boldsymbol{r}_{k}}{\boldsymbol{p}_{k}^{T}\boldsymbol{A}\boldsymbol{p}_{k}}\\ \boldsymbol{x}_{k+1}=\boldsymbol{x}_{k}+\alpha_{k} \boldsymbol{p}_{k} \\ \boldsymbol{r}_{k+1}=\boldsymbol{r}_{k}-\alpha _{k}\boldsymbol{A}\boldsymbol{p}_{k} \\ \beta _{k}=\frac{\boldsymbol{r}_{k+1}^{T}\boldsymbol{r}_{k+1}}{\boldsymbol{r}_{k}^{T}\boldsymbol{r}_{k}}\\ \boldsymbol{p}_{k+1}=\boldsymbol{r}_{k+1}+\beta _{k}\boldsymbol{p}_{k}

三、Projection Methods

投影法在较小的线性空间内寻找满足精度要求的近似解,也即在某个线性空间内寻找真解的投影。这其实就是投影法得名的原因。

对于非奇异矩阵\boldsymbol{A} \in \mathbb{R}^{n\times n},考虑线性方程组\boldsymbol{A}\boldsymbol{x}=\boldsymbol{b},其中,\boldsymbol{x}\in \mathbb{R}^{n}\boldsymbol{b}\in \mathbb{R}^{n}

\boldsymbol{r}=\boldsymbol{b}-\boldsymbol{A}\boldsymbol{x},首先,构造列满秩矩阵\boldsymbol{\mathcal{K}}\in \mathbb{R}^{n\times m}\boldsymbol{\mathcal{L}}\in \mathbb{R}^{n\times m},其中m\leq n;然后,设真解\boldsymbol{x}在线性空间内\boldsymbol{\mathcal{K}}的投影为\boldsymbol{\tilde{x}},即\boldsymbol{x}=\boldsymbol{\mathcal{K}}\boldsymbol{\tilde{x}},令其满足Petrov-Galerkin条件,即\forall \boldsymbol{y}\in \boldsymbol{\mathcal{L}},均有\boldsymbol{\mathcal{L}}^{T}\left ( \boldsymbol{b}-\boldsymbol{A}\boldsymbol{\tilde{x}} \right )=\boldsymbol{0}\boldsymbol{\mathcal{K}}称为搜索空间,\boldsymbol{\mathcal{L}}称为约束空间。若\boldsymbol{\mathcal{L}}=\boldsymbol{\mathcal{K}}时,称为正投影算法,否则称为斜投影算法。

若令\boldsymbol{\tilde{x}}=\mathcal{K}\boldsymbol{y},则有\boldsymbol{\mathcal{L}}^{T}\boldsymbol{A}\boldsymbol{\mathcal{K}}\boldsymbol{y}=\boldsymbol{\mathcal{L}}^{T}\boldsymbol{b},其中\boldsymbol{y}\in \mathbb{R}^{m}\boldsymbol{\mathcal{K}}^{T}\boldsymbol{A}\boldsymbol{\mathcal{K}}\in \mathbb{R}^{m\times m}\boldsymbol{\mathcal{L}}^{T}\boldsymbol{b}\in \mathbb{R}^{m}。可以看出,投影法实际上是将n阶线性方程组转化为了m阶线性方程组(m\leq n)。

讨论:

  •  若m=n

    以正则化方法为例,即\boldsymbol{\mathcal{L}}=\boldsymbol{A}\boldsymbol{\mathcal{K}}=\boldsymbol{I},由于\left (\boldsymbol{A}^{T}\boldsymbol{A} \right )^{T}=\boldsymbol{A}^{T}\boldsymbol{A},另外,\forall \boldsymbol{y}\in \mathbb{R}^{m},0" src="https://latex.csdn.net/eq?%5Cboldsymbol%7By%7D%5E%7By%7D%5Cboldsymbol%7BA%7D%5E%7BT%7D%5Cboldsymbol%7BA%7D%5Cboldsymbol%7By%7D%3D%5Cleft%20%28%20%5Cboldsymbol%7BA%7D%5Cboldsymbol%7By%7D%5Cright%20%29%5E%7BT%7D%5Cleft%20%28%20%5Cboldsymbol%7BA%7D%5Cboldsymbol%7By%7D%5Cright%20%29%3E0" />,即\boldsymbol{A}^{T}\boldsymbol{A}是对称正定矩阵。因此,正则化方法实际上将一般矩阵线性方程组\boldsymbol{A}\boldsymbol{x}=\boldsymbol{b}转化为对称正定线性方程组\boldsymbol{A}^{T}\boldsymbol{A}\boldsymbol{y}=\boldsymbol{A}^{T}\boldsymbol{b}

    • m<n

      由于\boldsymbol{\mathcal{L}}^{T}\boldsymbol{A}\boldsymbol{\mathcal{K}}\in \mathbb{R}^{m\times m}\boldsymbol{\mathcal{L}}^{T}\boldsymbol{b}\in \mathbb{R}^{m},线性方程组的规模由n阶降为了m阶。因此,可以通过投影法可将问题转换为更为简单的子问题,然后再进行求解。

      以求解非对称线性方程组的Arnoldi方法为例,即\boldsymbol{\mathcal{L}}=\boldsymbol{\mathcal{K}}=\boldsymbol{V},其中,\boldsymbol{A}\boldsymbol{V}=\boldsymbol{V}\boldsymbol{H}+\boldsymbol{f}\boldsymbol{e}_{m}^{T}\boldsymbol{V}^{T}\boldsymbol{V}=\boldsymbol{I}\boldsymbol{V}^{T}\boldsymbol{f}=0。则\boldsymbol{V}^{T}\boldsymbol{A}\boldsymbol{V}\boldsymbol{y}=\boldsymbol{V}^{T}\left (\boldsymbol{V}\boldsymbol{H}+\boldsymbol{f}\boldsymbol{e}_{m}^{T} \right )\boldsymbol{y}=\boldsymbol{H}\boldsymbol{y}=\boldsymbol{V}^{T}\boldsymbol{b}

      综合上述讨论,可归出结投影法的操作流程:

      • 使用投影法将问题转化为较易求解的子问题;
      • 使用合适的方法求解子问题。

        四、Krylov Subspace Methods

        Krylov子空间法本质上也是一种投影法,其核心思想是在较小维度的Krylov子空间内寻找满足精度要求的近似解。也就是说,相对于一般投影法,Krylov子空间法选取了Krylov子空间作为搜索空间\boldsymbol{\mathcal{K}}

        对于线性方程组\boldsymbol{A}\boldsymbol{x}=\boldsymbol{b},Krylov子空间法可以表述为:给定初始解向量\boldsymbol{x}_{0},令\boldsymbol{r}_{0}=\boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_{0},选取\boldsymbol{\mathcal{K}}m维Krylov子空间,即\boldsymbol{\mathcal{K}}=\boldsymbol{\mathcal{K}}\left ( \boldsymbol{A},\boldsymbol{r}_{0},m \right )=span\left ( \boldsymbol{r}_{0} , \boldsymbol{A}\boldsymbol{r}_{0}, \boldsymbol{A}^{2} \boldsymbol{r}_{0},\cdots ,\boldsymbol{A}^{m-1}\boldsymbol{r}_{0} \right ),寻找\boldsymbol{\tilde{x}}\in\boldsymbol{x}_{0}+\boldsymbol{\mathcal{K}}\left ( \boldsymbol{A},\boldsymbol{r}_{0}, m \right ),使得\boldsymbol{A}\boldsymbol{x}=\boldsymbol{b}满足Petrov-Galerkin条件,即\mathcal{L}^{T}\boldsymbol{A}\boldsymbol{\tilde{x}}=\mathcal{L}^{T}\boldsymbol{b}。其中\boldsymbol{\mathcal{L}}\in \mathbb{R}^{n\times m},选择不同的\boldsymbol{\mathcal{L}},就对应不同的Krylov子空间法,

        若设\boldsymbol{W}\boldsymbol{V}分别是\boldsymbol{\mathcal{L}}\boldsymbol{\mathcal{K}}的一组基向量,并令\boldsymbol{\tilde{x}}=\boldsymbol{x}_{0}+\boldsymbol{V}\boldsymbol{y},则有\boldsymbol{W}^{T}\boldsymbol{A}\boldsymbol{V}\boldsymbol{y}=\boldsymbol{W}^{T}\boldsymbol{r}_{0}。若\boldsymbol{W}^{T}\boldsymbol{A}\boldsymbol{V}可逆,则有\boldsymbol{y}=\left ( \boldsymbol{W}^{T}\boldsymbol{A}\boldsymbol{V} \right )^{-1}\boldsymbol{W}^{T}\boldsymbol{r}_{0}

        由此可以看出,Krylov子空间法的核心是如何计算\boldsymbol{\mathcal{L}}\boldsymbol{\mathcal{K}}。对于\boldsymbol{\mathcal{L}},目前广泛采用的选取方法如下表,

        约束空间\boldsymbol{\mathcal{L}}典型方法
        \boldsymbol{\mathcal{L}}=\boldsymbol{\mathcal{K}}CG
        \boldsymbol{\mathcal{L}}=\boldsymbol{A}\boldsymbol{\mathcal{K}}MINRES、GMRES
        \boldsymbol{\mathcal{L}}=\boldsymbol{\mathcal{K}}\left ( \boldsymbol{A}^{T},\boldsymbol{r}_{0},m \right )BiCG

        4.1 Steepest Descent Method

        4.2 Conjugate Gradient Method

        下面考虑对称正定线性方程组\boldsymbol{A}\boldsymbol{x}=\boldsymbol{b},其中,\boldsymbol{A} \in \mathbb{R}^{n\times n}\boldsymbol{x}\in \mathbb{R}^{n}\boldsymbol{b}\in \mathbb{R}^{n}

        使用Arnoldi分解定理,可得到矩阵\boldsymbol{A}m步Arnoldi分解,即\boldsymbol{A}\boldsymbol{V}_{m}=\boldsymbol{V}_{m}\boldsymbol{H}_{m}+\boldsymbol{f}\boldsymbol{e}_{m}^{T},其中,\boldsymbol{V}_{m}\in \mathbb{R}^{n\times m}\boldsymbol{V}_{m}^{T}\boldsymbol{V}_{m}=\boldsymbol{I}_{m}\boldsymbol{V}_{m}^{T}\boldsymbol{f}=0\boldsymbol{H}_{m}\in \mathbb{R}^{m\times m}是Hessenberg矩阵,\boldsymbol{I}_{m}\in \mathbb{R}^{m\times m}m阶单位矩阵,\boldsymbol{e}_{m}^{T}=\left ( 0,0,\cdots,0,1 \right )\in \mathbb{R}^{m}

        因为\boldsymbol{V}_{m}^{T}\boldsymbol{V}_{m}=\boldsymbol{I}_{m}\boldsymbol{V}_{m}^{T}\boldsymbol{f}=0,可知\boldsymbol{V}_{m}^{T}\boldsymbol{A}\boldsymbol{V}_{m}=\boldsymbol{V}_{m}^{T}\left (\boldsymbol{V}_{m}\boldsymbol{H}_{m}+\boldsymbol{f}\boldsymbol{e}_{m}^{T} \right )=\boldsymbol{H}_{m},由于\boldsymbol{A}对称,则矩阵\boldsymbol{H}_{m}也是对称矩阵,进一步,结合Hessenberg矩阵的定义,可知\boldsymbol{H}_{m}是三对角矩阵。另外,对于\forall\boldsymbol{x}\in \mathbb{R}^{n},则\boldsymbol{y}=\boldsymbol{V}_{m}\boldsymbol{x}\neq \boldsymbol{0},此时\boldsymbol{x}^{T}\boldsymbol{A}\boldsymbol{x}=\boldsymbol{y}^{T}\boldsymbol{V}_{m}^{T}\boldsymbol{A}\boldsymbol{V}_{m}\boldsymbol{y}=\boldsymbol{y}^{T}\boldsymbol{H}_{m}\boldsymbol{y},由于\boldsymbol{A}正定,则知0" src="https://latex.csdn.net/eq?%5Cboldsymbol%7By%7D%5E%7BT%7D%5Cboldsymbol%7BH%7D_%7Bm%7D%5Cboldsymbol%7By%7D%3D%5Cboldsymbol%7Bx%7D%5E%7BT%7D%5Cboldsymbol%7BA%7D%5Cboldsymbol%7Bx%7D%3E0" />,即矩阵\boldsymbol{H}_{m}也是正定矩阵。因此,若矩阵\boldsymbol{A}对称正定,矩阵\boldsymbol{H}_{m}也是对称正定矩阵,更加准确地说,\boldsymbol{H}_{m}是对称正定三对角矩阵。

        \boldsymbol{V}_{m}=\left ( \boldsymbol{v}_{1},\boldsymbol{v}_{2},\cdots ,\boldsymbol{v}_{m} \right ),并令矩阵\boldsymbol{H}_{m}i行第j列元素\boldsymbol{H}_{m}\left ( i,j \right )=h_{ij},则有

        Av_{j}=\sum_{i=1}^{i=j}h_{i,j}v_{i}+h_{j+1,j}v_{j+1},考虑到\boldsymbol{V}_{m}^{T}\boldsymbol{V}_{m}=\boldsymbol{I}_{m},因此,可得到Arnoldi分解的递推公式,

        \left\{\begin{matrix} h_{i,j}=\frac{\mathbf{v}_{i}^{T}\boldsymbol{A}\boldsymbol{v}_{j}}{\boldsymbol{v}_{i}^{T}\boldsymbol{v}_{i}}=\mathbf{v}_{i}^{T}\boldsymbol{A}\boldsymbol{v}_{j}, i\in \left [ 1,j \right ]\\ h_{j+1,j}=\frac{\mathbf{v}_{j}^{T}\boldsymbol{A}\boldsymbol{v}_{j}}{\boldsymbol{v}_{j+1}^{T}\boldsymbol{v}_{j+1}}=\mathbf{v}_{j}^{T}\boldsymbol{A}\boldsymbol{v}_{j}, j\in \left [ 1,m-1 \right ]\\ \boldsymbol{v}_{j+1}= \frac{Av_{j}-\sum_{i=1}^{i=j}h_{i,j}v_{i}}{h_{j+1,j}}, j\in \left [ 1,m-1 \right ]\end{matrix}\right.

        由此,可以看出矩阵\boldsymbol{V}_{m}是Krylov子空间\boldsymbol{\mathcal{K}}\left ( \boldsymbol{A},\boldsymbol{v}_{0},m \right )的一组标准正交基。因此,对应于Krylov子空间法,可取\boldsymbol{W}=\boldsymbol{V}=\boldsymbol{V}_{m}

        若给定的初始向量\boldsymbol{x}_{0},令近似解\boldsymbol{\tilde{x}}_{m}=\boldsymbol{x}_{0}+\boldsymbol{V}_{m}\boldsymbol{y},根据Petrov-Galerkin条件,有\boldsymbol{V}_{m}^{T}\left (\boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_{0}-\boldsymbol{A}\boldsymbol{V}_{m}\boldsymbol{y} \right )=0,记\boldsymbol{r}_{0}=\boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_{0},则\boldsymbol{V}_{m}^{T}\boldsymbol{A}\boldsymbol{V}_{m}\boldsymbol{y}=\boldsymbol{V}_{m}^{T}\boldsymbol{r}_{0},结合Arnoldi分解,有\boldsymbol{V}_{m}^{T}\left (\boldsymbol{V}_{m}\boldsymbol{H}_{m}+\boldsymbol{f}\boldsymbol{e}_{m}^{T} \right )\boldsymbol{y}=\boldsymbol{V}_{m}^{T}\boldsymbol{r}_{0},考虑到\boldsymbol{V}_{m}^{T}\boldsymbol{V}_{m}=\boldsymbol{I}_{m}\boldsymbol{V}_{m}^{T}\boldsymbol{f}=0,即有\boldsymbol{H}_{m}\boldsymbol{y}=\boldsymbol{V}_{m}^{T}\boldsymbol{r}_{0}

        根据Cholesky分解定理,因为\boldsymbol{H}_{m}对称正定,则\boldsymbol{H}_{m}=\boldsymbol{L}_{m}\boldsymbol{D}_{m}\boldsymbol{T}_{m},其中,矩阵\boldsymbol{L}_{m}\in \mathbb{R}^{m\times m}是下三角矩阵,\boldsymbol{D}_{m}\in \mathbb{R}^{m\times m}为对角元均为正数的对角矩阵,\boldsymbol{T}_{m}=\left (\boldsymbol{L}_{m} \right )^{T}

        若选取\boldsymbol{v}_{1}=\frac{\boldsymbol{r}_{0}}{\boldsymbol{r}_{0}^{T}\boldsymbol{r}_{0}},则\boldsymbol{V}_{m}是Krylov子空间\boldsymbol{\mathcal{K}}\left ( \boldsymbol{A},\boldsymbol{r}_{0},m \right )的一组标准正交基,此时

        \boldsymbol{V}_{m}^{T}\boldsymbol{r}_{0}=\begin{pmatrix} \boldsymbol{v}_{1}^{T}\\ \boldsymbol{v}_{2}^{T}\\ \vdots \\ \boldsymbol{v}_{m}^{T} \end{pmatrix}\boldsymbol{r}_{0}=\begin{pmatrix} \boldsymbol{v}_{1}^{T}\boldsymbol{r}_{0}\\ 0\\ \vdots \\ 0 \end{pmatrix}=\begin{pmatrix} \boldsymbol{r}_{0}^{T}\boldsymbol{r}_{0}\\ 0\\ \vdots \\ 0 \end{pmatrix}

        \beta =\boldsymbol{r}_{0}^{T}\boldsymbol{r}_{0}\boldsymbol{d}_{m}^{T}=\left ( 1,0,\cdots ,0 \right )\in \mathbb{R}^{m},则有\boldsymbol{H}_{m}\boldsymbol{y}=\beta \boldsymbol{d}_{m}

        由上述分析,矩阵\boldsymbol{A}进行m步Arnoldi分解,使用Krylov子空间法,可以得到近似解\boldsymbol{\tilde{x}}_{m}

        \boldsymbol{\tilde{x}}_{m}=\boldsymbol{x}_{0}+\beta \boldsymbol{V}_{m}\boldsymbol{T}_{m}^{-1}\boldsymbol{D}_{m}^{-1}\boldsymbol{L}_{m}^{-1}\boldsymbol{d}_{m},其中,\boldsymbol{T}_{m}=\left (\boldsymbol{L}_{m} \right )^{T}

        类似地,矩阵\boldsymbol{A}进行m+1步Arnoldi分解,按照上面地思路,亦可得到近似解\boldsymbol{\tilde{x}}_{m+1}

        \boldsymbol{\tilde{x}}_{m+1}=\boldsymbol{x}_{0}+\beta \boldsymbol{V}_{m+1}\boldsymbol{T}_{m+1}^{-1}\boldsymbol{D}_{m+1}^{-1}\boldsymbol{L}_{m+1}^{-1}\boldsymbol{d}_{m+1},其中,\boldsymbol{T}_{m+1}=\left (\boldsymbol{L}_{m+1} \right )^{T}

        根据Arnoldi分解过程,很明显,

        \boldsymbol{V}_{m+1}=\left (\boldsymbol{V}_{m},\boldsymbol{v}_{m+1} \right )

        \boldsymbol{H}_{m+1}\left ( 1:m,1:m \right )=\boldsymbol{H}_{m}

        \boldsymbol{H}_{m+1}=\begin{pmatrix} \boldsymbol{H}_{m}&\boldsymbol{H}_{m+1}\left ( 1:m, m+1 \right ) \\ \boldsymbol{H}_{m+1}\left ( m+1,1:m \right )& \boldsymbol{H}_{m+1}\left ( m+1,m+1 \right ) \end{pmatrix}

        同时,\boldsymbol{H}_{m+1}=\boldsymbol{L}_{m+1}\boldsymbol{D}_{m+1}\boldsymbol{T}_{m+1},并将\boldsymbol{H}_{m+1}\boldsymbol{L}_{m+1}\boldsymbol{D}_{m+1}\boldsymbol{T}_{m+1}分块,则有

        \boldsymbol{H}_{m+1}=\begin{pmatrix} \boldsymbol{H}_{m+1}\left ( 1:m,1:m \right ) & \boldsymbol{H}_{m+1}\left ( 1:m,m+1 \right ) \\ \boldsymbol{H}_{m+1}\left ( m+1,1:m \right )&\boldsymbol{H}_{m+1}\left ( m+1,m+1 \right ) \end{pmatrix}

        \boldsymbol{L}_{m+1}=\begin{pmatrix} \boldsymbol{L}_{m+1}\left ( 1:m,1:m \right ) & 0\\ \boldsymbol{L}_{m+1}\left ( m+1,1:m \right )&\boldsymbol{L}_{m+1}\left ( m+1,m+1 \right ) \end{pmatrix}

        \boldsymbol{D}_{m+1}=\begin{pmatrix} \boldsymbol{D}_{m+1}\left ( 1:m,1:m \right ) & 0\\ 0&\boldsymbol{D}_{m+1}\left ( m+1,m+1 \right ) \end{pmatrix}

        \boldsymbol{T}_{m+1}=\begin{pmatrix} \boldsymbol{T}_{m+1}\left ( 1:m,1:m \right ) & \boldsymbol{T}_{m+1}\left ( 1:m,m+1 \right )\\0 & \boldsymbol{T}_{m+1}\left ( m+1,m+1 \right ) \end{pmatrix}

        则有,\boldsymbol{L}_{m+1}\boldsymbol{D}_{m+1}=\begin{pmatrix} \boldsymbol{L}_{m+1}\left ( 1:m,1:m \right )\boldsymbol{D}_{m+1}\left ( 1:m,1:m \right ) & \boldsymbol{0}\\ \boldsymbol{L}_{m+1}\left ( m+1,1:m \right )\boldsymbol{D}_{m+1}\left ( 1:m,1:m \right )&\boldsymbol{L}_{m+1}\left ( m+1,m+1 \right ) \boldsymbol{D}_{m+1}\left ( m+1,m+1 \right )\end{pmatrix}

        \boldsymbol{H}_{m+1}\left ( 1:m,1:m \right )=\boldsymbol{L}_{m+1}\left ( 1:m,1:m \right )\boldsymbol{D}_{m+1}\left ( 1:m,1:m \right )\boldsymbol{T}_{m+1}\left ( 1:m,1:m \right )

        \boldsymbol{H}_{m+1}\left ( 1:m,m+1 \right )= \boldsymbol{L}_{m+1}\left ( 1:m,1:m \right )\boldsymbol{D}_{m+1}\left ( 1:m,1:m \right )\boldsymbol{T}_{m+1}\left ( 1:m,m+1 \right )

        \boldsymbol{H}_{m+1}\left ( m+1,1:m \right )= \boldsymbol{L}_{m+1}\left ( m+1,1:m \right )\boldsymbol{D}_{m+1}\left ( 1:m,1:m \right ) \boldsymbol{T}_{m+1}\left ( 1:m,1:m \right )

        \boldsymbol{H}_{m+1}\left ( m+1,m+1 \right )= \boldsymbol{L}_{m+1}\left ( m+1,1:m \right )\boldsymbol{D}_{m+1}\left ( 1:m,1:m \right )\boldsymbol{T}_{m+1}\left ( 1:m,m+1 \right )+\boldsymbol{L}_{m+1}\left ( m+1,m+1 \right ) \boldsymbol{D}_{m+1}\left ( m+1,m+1 \right )\boldsymbol{T}_{m+1}\left ( m+1,m+1 \right )

        由于\boldsymbol{H}_{m}\boldsymbol{H}_{m+1}是对称正定三角矩阵,则知\boldsymbol{H}_{m+1}\left ( 1:m,m+1 \right )=\left ( 0,0,\cdots ,0,\boldsymbol{H}_{m+1}\left ( m,m+1 \right )\right )^{T}\boldsymbol{H}_{m+1}\left ( m+1,1:m \right )=\left ( 0,0,\cdots ,0,\boldsymbol{H}_{m+1}\left ( m+1,m \right ) \right )\boldsymbol{H}_{m+1}\left ( m,m+1 \right )=\sum_{k=1}^{k=m}\boldsymbol{L}_{m+1}\left ( m,k \right )\boldsymbol{D}_{m+1}\left ( k,k \right )\boldsymbol{L}_{m+1}\left ( m+1,k \right )

        \boldsymbol{H}_{m+1}\left ( m+1,m \right )=\sum_{k=1}^{k=m}\boldsymbol{L}_{m+1}\left ( m+1,k \right )\boldsymbol{D}_{m+1}\left ( k,k \right ) \boldsymbol{L}_{m+1}\left ( m,k \right )

        \boldsymbol{T}_{m+1}\left ( 1:m,m+1 \right )=\left ( 0,0,\cdots ,0,\tau_{m+1}\right )^{T},

        其中,\tau _{m+1}=\frac{\boldsymbol{H}_{m+1}\left ( m,m+1 \right )}{\boldsymbol{L}_{m+1}\left ( m,m \right )\boldsymbol{D}_{m+1}\left ( m,m \right )}

        考虑到Cholesky分解的唯一性,则\boldsymbol{L}_{m+1}\left ( 1:m,1:m \right )=\boldsymbol{L}_{m}\boldsymbol{D}_{m+1}\left ( 1:m,1:m \right )=\boldsymbol{D}_{m}\boldsymbol{T}_{m+1}\left ( 1:m,1:m \right )=\boldsymbol{T}_{m},即

        \boldsymbol{L}_{m+1}=\begin{pmatrix} \boldsymbol{L}_{m} &\boldsymbol{0} \\ \boldsymbol{L}_{m+1}\left ( m+1,1:m \right ) & \boldsymbol{L}_{m+1}\left ( m+1,m+1 \right ) \end{pmatrix}

        \boldsymbol{D}_{m+1}=\begin{pmatrix} \boldsymbol{D}_{m} &\boldsymbol{0} \\ \boldsymbol{0} & \boldsymbol{D}_{m+1}\left ( m+1,m+1 \right ) \end{pmatrix}

        \boldsymbol{T}_{m+1}=\begin{pmatrix} \boldsymbol{T}_{m} & \boldsymbol{T}_{m+1}\left (1:m, m+1 \right )\\ \boldsymbol{0} & \boldsymbol{T}_{m+1}\left ( m+1,m+1 \right ) \end{pmatrix}

        \boldsymbol{L}_{m+1}^{-1}=\begin{pmatrix} \boldsymbol{L}_{m}^{-1} &\boldsymbol{0} \\ -\frac{\boldsymbol{L}_{m+1}\left ( m+1,1:m \right )}{\boldsymbol{L}_{m+1}\left ( m+1,m+1 \right )}\boldsymbol{L}_{m}^{-1} & \frac{1}{\boldsymbol{L}_{m+1}\left ( m+1,m+1 \right )} \end{pmatrix}

        \boldsymbol{D}_{m+1}^{-1}=\begin{pmatrix} \boldsymbol{D}_{m}^{-1} &\boldsymbol{0} \\ \boldsymbol{0} & \frac{1}{\boldsymbol{D}_{m+1}\left ( m+1,m+1 \right )} \end{pmatrix}

        \boldsymbol{T}_{m+1}^{-1}=\begin{pmatrix} \boldsymbol{T}_{m}^{-1} &-\boldsymbol{T}_{m}^{-1}\frac{\boldsymbol{T}_{m+1}\left ( 1:m,m+1 \right )}{\boldsymbol{T}_{m+1}\left ( m+1,m+1 \right )} \\ \boldsymbol{0} & \frac{1}{\boldsymbol{T}_{m+1}\left ( m+1,m+1 \right )} \end{pmatrix}

        \boldsymbol{V}_{m+1}\boldsymbol{T}_{m+1}^{-1}=\begin{pmatrix} \boldsymbol{V}_{m}\boldsymbol{T}_{m}^{-1} &\frac{\boldsymbol{V}_{m+1}\left ( 1:n,m+1 \right )-\boldsymbol{V}_{m}\boldsymbol{T}_{m}^{-1}\boldsymbol{T}_{m+1}\left ( 1:m,m+1 \right )}{\boldsymbol{T}_{m+1}\left ( m+1,m+1 \right )} \end{pmatrix}

        \boldsymbol{D}_{m+1}^{-1}\boldsymbol{L}_{m+1}^{-1}\boldsymbol{d}_{m+1}=\begin{pmatrix}\boldsymbol{D}_{m}^{-1} \boldsymbol{L}_{m}^{-1}\boldsymbol{d}_{m}\\ \frac{ \boldsymbol{d}_{m+1}\left ( m+1 \right )-\boldsymbol{L}_{m+1}\left ( m+1,1:m \right )\boldsymbol{L}_{m}^{-1}\boldsymbol{d}_{m}}{\boldsymbol{L}_{m+1}\left ( m+1,m+1 \right )\boldsymbol{D}_{m+1}\left ( m+1,m+1 \right )}\end{pmatrix}

        \boldsymbol{p}_{m+1}=\frac{\boldsymbol{V}_{m+1}\left ( 1:n,m+1 \right )-\boldsymbol{V}_{m}\boldsymbol{T}_{m}^{-1}\boldsymbol{T}_{m+1}\left ( 1:m,m+1 \right )}{\boldsymbol{T}_{m+1}\left ( m+1,m+1 \right )}\mu _{m+1}=\beta \frac{ \boldsymbol{d}_{m+1}\left ( m+1 \right )-\boldsymbol{L}_{m+1}\left ( m+1,1:m \right )\boldsymbol{L}_{m}^{-1}\boldsymbol{d}_{m}}{\boldsymbol{L}_{m+1}\left ( m+1,m+1 \right )\boldsymbol{D}_{m+1}\left ( m+1,m+1 \right )}

        则,\beta \boldsymbol{V}_{m+1}\boldsymbol{T}_{m+1}^{-1}\boldsymbol{D}_{m+1}^{-1}\boldsymbol{L}_{m+1}^{-1}\boldsymbol{d}_{m+1}=\beta \boldsymbol{V}_{m}\boldsymbol{T}_{m}^{-1}\boldsymbol{D}_{m}^{-1}\boldsymbol{L}_{m}^{-1}\boldsymbol{d}_{m}+\mu _{m+1}\boldsymbol{p}_{m+1}

        由上述表达式,可得共轭梯度法中近似解的递推公式:

        \boldsymbol{\tilde{x}}_{m+1}=\boldsymbol{\tilde{x}}_{m}+\mu _{m+1}\boldsymbol{p}_{m+1}

        将近似解\boldsymbol{\tilde{x}}_{m+1}代入原方程组,可得残差向量的递推公式:\boldsymbol{r}_{m+1}=\boldsymbol{b}-\boldsymbol{A} \boldsymbol{\tilde{x}}_{m+1}=\boldsymbol{b}-\boldsymbol{A}\left ( \boldsymbol{\tilde{x}}_{m}+\mu _{m+1}\boldsymbol{p}_{m+1} \right )=\boldsymbol{r}_{m}-\mu _{m+1}\boldsymbol{A}\boldsymbol{p}_{m+1}

        由于\boldsymbol{r}_{m}=\boldsymbol{b}-\boldsymbol{A}\left ( \boldsymbol{x}_{0}+\boldsymbol{V}_{m}\boldsymbol{y}_{m} \right ) =\boldsymbol{r}_{0}-\boldsymbol{A}\boldsymbol{V}_{m}\boldsymbol{y}_{m},将m步Arnoldi分解代入,则有\boldsymbol{r}_{m}=\boldsymbol{r}_{0}-\boldsymbol{V}_{m}\boldsymbol{H}_{m}\boldsymbol{y}_{m}-\boldsymbol{v}_{m+1}\boldsymbol{e}_{m}^{T}\boldsymbol{y}_{m},其中,\boldsymbol{e}_{m}^{T}=\left ( 0,0,\cdots,0,1 \right )\in \mathbb{R}^{m}。另外,由于\boldsymbol{H}_{m}\boldsymbol{y}=\boldsymbol{V}_{m}^{T}\boldsymbol{r}_{0},则有,\boldsymbol{r}_{m}=\boldsymbol{r}_{0}-\boldsymbol{V}_{m}\boldsymbol{V}_{m}^{T}\boldsymbol{r}_{0}-\boldsymbol{v}_{m+1}\left (\boldsymbol{e}_{m}^{T}\boldsymbol{y}_{m} \right )。如前面分析知,\boldsymbol{V}_{m}^{T}\boldsymbol{r}_{0}=\left ( \boldsymbol{r}_{1}^{T}\boldsymbol{r}_{0},0,\cdots ,0 \right )^{T},所以,\boldsymbol{V}_{m}\boldsymbol{V}_{m}^{T}\boldsymbol{r}_{0}=\left ( \boldsymbol{r}_{0}^{T}\boldsymbol{r}_{0} \right )\boldsymbol{v}_{1}=\boldsymbol{r}_{0}。因此,\boldsymbol{r}_{m}=-\boldsymbol{v}_{m+1}\boldsymbol{e}_{m}^{T}\boldsymbol{y}_{m},也就是说,\boldsymbol{r}_{m}平行于\boldsymbol{v}_{m+1},且\boldsymbol{r}_{i}^{T}\boldsymbol{r}_{j}=0, i\neq j

        由于\boldsymbol{T}_{m+1}\left ( 1:m,m+1 \right )=\left ( 0,0,\cdots ,0,\tau_{m+1}\right )^{T},则有\boldsymbol{p}_{m+2}=\frac{\boldsymbol{V}_{m+2}\left ( 1:n,m+2 \right )-\tau _{m+2}\boldsymbol{p}_{m+1}}{\boldsymbol{T}_{m+2}\left ( m+2,m+2 \right )},也就是说,\boldsymbol{p}_{m+1}=\frac{\boldsymbol{V}_{m+1}\left ( 1:n,m+1 \right )-\tau _{m+1}\boldsymbol{p}_{m}}{\boldsymbol{T}_{m+1}\left ( m+1,m+1 \right )}。实际上,\boldsymbol{V}_{m}\boldsymbol{T}_{m}^{-1}\boldsymbol{T}_{m+1}\left ( 1:m,m+1 \right )\boldsymbol{V}_{m}=\left ( \boldsymbol{v}_{1},\boldsymbol{v}_{2},\cdots ,\boldsymbol{v}_{m} \right )的线性组合,\boldsymbol{p}_{m+1}\boldsymbol{V}_{m+1}=\left ( \boldsymbol{v}_{1},\boldsymbol{v}_{2},\cdots ,\boldsymbol{v}_{m},\boldsymbol{v}_{m+1} \right )的线性组合。对于i<m+1\boldsymbol{p}_{i}^{T}\boldsymbol{A}\boldsymbol{p}_{m+1}=\boldsymbol{p}_{i}^{T}\left ( \boldsymbol{r}_{m}-\boldsymbol{r}_{m+1} \right )=\boldsymbol{p}_{i}^{T}\left (\boldsymbol{v}_{m+2}\boldsymbol{e}_{m+1}^{T}\boldsymbol{y}_{m+1} -\boldsymbol{v}_{m+1}\boldsymbol{e}_{m}^{T}\boldsymbol{y}_{m} \right ),根据矩阵\boldsymbol{V}_{m+1}正交性,很容易知道,\boldsymbol{p}_{i}^{T}\boldsymbol{A}\boldsymbol{p}_{m+1}=0。也就是说,对于i<m+1,向量\boldsymbol{p}_{i}与向量\boldsymbol{p}_{m+1}关于矩阵\boldsymbol{A}共轭。这其实就是共轭梯度法得名的原因。

        由上述分析,可知,\boldsymbol{p}_{m+1}=\frac{\xi _{m+1}}{\mu _{m+1}}\boldsymbol{r}_{m}+\frac{\eta _{m+1}}{\mu _{m+1}}\boldsymbol{p}_{m},其中,\xi _{m+1}=-\frac{\mu _{m+1}}{\boldsymbol{e}_{m}^{T}\boldsymbol{y}_{m}\boldsymbol{T}_{m+1}\left ( m+1,m+1 \right )}\eta _{m+1}=-\frac{\tau _{m+1}\mu _{m+1}}{\boldsymbol{T}_{m+1}\left ( m+1,m+1 \right )}

        至此,可以得到简化后的近似解表达式:\boldsymbol{\tilde{x}}_{m+1}=\boldsymbol{\tilde{x}}_{m}+\xi _{m+1}\boldsymbol{r}_{m}+\eta _{m+1}\boldsymbol{p}_{m}

        相应地,简化后的的残差向量表达式:\boldsymbol{r}_{m+1}=\boldsymbol{r}_{m}-\xi _{m+1}\boldsymbol{A}\boldsymbol{r}_{m}-\eta _{m+1}\boldsymbol{A}\boldsymbol{p}_{m}

        \boldsymbol{r}_{m+1}=\boldsymbol{r}_{m}-\mu _{m+1}\boldsymbol{A}\boldsymbol{p}_{m+1},可知\boldsymbol{r}_{m}^{T}\boldsymbol{r}_{m}=\mu _{m+1}\boldsymbol{r}_{m}^{T}\boldsymbol{A}\boldsymbol{p}_{m+1}。即\mu _{m+1}=\frac{\boldsymbol{r}_{m}^{T}\boldsymbol{r}_{m}}{\boldsymbol{r}_{m}^{T}\boldsymbol{A}\boldsymbol{p}_{m+1}}

        由于\boldsymbol{p}_{m+1}=\frac{\boldsymbol{V}_{m+1}\left ( 1:n,m+1 \right )-\tau _{m+1}\boldsymbol{p}_{m}}{\boldsymbol{T}_{m+1}\left ( m+1,m+1 \right )},可知\boldsymbol{p}_{m+1}^{T}\boldsymbol{A}\boldsymbol{p}_{m+1}=\frac{\boldsymbol{p}_{m+1}^{T}\boldsymbol{A}\boldsymbol{V}_{m+1}\left ( 1:n,m+1 \right )}{\boldsymbol{T}_{m+1}\left ( m+1,m+1 \right )}=-\frac{\boldsymbol{r}_{m}^{T}\boldsymbol{A}\boldsymbol{p}_{m+1}}{\boldsymbol{e}_{m}^{T}\boldsymbol{y}_{m}\boldsymbol{T}_{m+1}\left ( m+1,m+1 \right )}。即\xi _{m+1}=-\frac{\mu _{m+1}}{\boldsymbol{e}_{m}^{T}\boldsymbol{y}_{m}\boldsymbol{T}_{m+1}\left ( m+1,m+1 \right )}=\frac{\boldsymbol{r}_{m}^{T}\boldsymbol{r}_{m}}{\boldsymbol{p}_{m+1}^{T}\boldsymbol{A}\boldsymbol{p}_{m+1}}

        另外,由于\boldsymbol{p}_{m+1}=\frac{\boldsymbol{V}_{m+1}\left ( 1:n,m+1 \right )-\tau _{m+1}\boldsymbol{p}_{m}}{\boldsymbol{T}_{m+1}\left ( m+1,m+1 \right )},可知-\frac{\boldsymbol{p}_{m}^{T}\boldsymbol{A}\boldsymbol{r}_{m}}{\boldsymbol{e}_{m}^{T}\boldsymbol{y}_{m}}=\tau _{m+1}\boldsymbol{p}_{m}^{T}\boldsymbol{A}\boldsymbol{p}_{m}。即\eta _{m+1}=-\tau _{m+1}\xi _{m+1}\boldsymbol{e}_{m}^{T}\boldsymbol{y}_{m}=\frac{\boldsymbol{p}_{m}^{T}\boldsymbol{A}\boldsymbol{r}_{m}}{\boldsymbol{p}_{m}^{T}\boldsymbol{A}\boldsymbol{p}_{m}}

        相应地,\boldsymbol{p}_{m+1}=\frac{\xi _{m+1}}{\mu _{m+1}}\boldsymbol{r}_{m}+\frac{\eta _{m+1}}{\mu _{m+1}}\boldsymbol{p}_{m}

        以上即为整个共轭梯度法的推导过程。

        4.3 Preconditioned Conjugate Gradient

        参考书籍

        Golub G H , Loan C F V .Matrix Computations.Johns Hopkins University Press,1996.

        Ford W .Numerical Linear Algebra with Applications using MATLAB. 2014.

        徐树方. 数值线性代数(第二版).  北京大学出版社, 2010.

        参考文献

        Hestenes M R , Stiefel E L .Methods of Conjugate Gradients for Solving Linear Systems. Journal of Research of the National Bureau of Standards (United States), 1952. 

        Arnoldi W E .The principle of minimized iterations in the solution of the matrix eigenvalue problem.Quarterly of Applied Mathematics, 1951, 9(1).

        Saad Y .Krylov Subspace Methods for Solving Large Unsymmetric Linear Systems.Mathematics of Computation, 1981, 37(155):105-105.