The QR (eigenvalue) algorithm¶

We've discussed how to compute a factorization

$A = QR,$

where $Q$ is orthogonal and $R$ is upper-triangular. For our presentation of the QR algorithm, we assume that $A= A^T$ , i.e. $A$ is symmetric. The QR algorithm applies more generally, but we focus only on this case.

Because $A$ is symmetric, we know that, by the Spectral Theorem, $A = U D U^T$ where $D$ is a diagonal matrix (containing the eigenvalues) and $U$ is an orthogonal matrix (containing the eigenvectors). We continue with the philosophy of taking powers of the matrix, to assist us in finding the eigenvalues:

$A^k = U D^k U^T.$

The $2 \times 2$ case¶

We will consider the case where $A$ is $2 \times 2$ first. We assume $|\lambda_1| > |\lambda_2|$ . If we look at the first column of $A^k$ , we are computing the vector

$r_1^{(k)}=A^k e_1, \quad e_1 = \begin{bmatrix} 1 & 0 \end{bmatrix}^T.$

This is very similar to the Power method with starting vector $x^{(0)} = e_1$ . Indeed, let $u_1$ and $u_2$ be the two columns of $U$ , so that

$r_1^{(k)}=A^k e_1 = U D^k U^T e_1 = \begin{bmatrix} u_1 & u_2 \end{bmatrix} \begin{bmatrix} \lambda^k_1 & 0 \\ 0 & \lambda^k_2 \end{bmatrix} \begin{bmatrix} u_1^T \\ u_2^T \end{bmatrix} e_1,$

$= \lambda_1^k (u_1^T e_1) u_1 + \lambda_2^k (u_2^Te_1) u_2 = \lambda_1^k (u_1^T e_1) u_1 \left(1 + O \left( \left| \frac{\lambda_2}{\lambda_1}\right|^k \right) \right), \quad k \to \infty.$

So, it follows that $A^ke_1$ aligns with the dominant eigevector $u_1$ as $k \to \infty$ . A nice way to mathematically capture this fact is:

$\lim_{k\to \infty} \frac{ |u_1^Tr_1^{(k)}|}{|u_2^Tr_1^{(k)}|} = \infty.$

i.e., the projection on to the first eigenvector is asymptotically larger the projection onto the second. Typically, the same is true of $A^k e_2$ , the second column of $A^k$ :

$A^k e_2 = U D^k U^T e_2 = \begin{bmatrix} u_1 & u_2 \end{bmatrix} \begin{bmatrix} \lambda^k_1 & 0 \\ 0 & \lambda^k_2 \end{bmatrix} \begin{bmatrix} u_1^T \\ u_2^T \end{bmatrix} e_2,$

$= \lambda_1^k (u_1^T e_2) u_1 + \lambda_2^k (u_2^Te_2) u_2.$

For notational convenience, define $\beta_{ij} = u_i^T e_j$ , all of which we assume are non-zero. To try to isolate the second eigenvector, we can try Gram-Schmidt:

$r_2^{(k)} = A^ke_2 - \left( \frac{(A^ke_1)^T A^k e_2}{(A^k e_1)^T A^k e_1)} \right) A^k e_1.$

This creates a vector $r_k$ that is orthogonal to the first column of $A$ . So, we can hope that $r_k$ will begin to look like the second eigenvector of $A$ :

$r_2^{(k} = \lambda_1^k \beta_{12} u_1 + \lambda_2^k \beta_{22} u_2 - \frac{(\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2)^T(\lambda_1^k \beta_{12} u_1 + \lambda_2^k \beta_{22} u_2)}{(\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2)^T(\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2)}(\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2)$

We compute the products that occur in the fraction using the rule: If $u$ and $v$ are orthogonal, then $(\alpha u + \beta v)^T(\gamma u + \delta v) = \alpha \gamma + \beta \delta$ for scalars $\alpha,\beta,\gamma,\delta$ . We find

$(\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2)^T(\lambda_1^k \beta_{12} u_1 + \lambda_2^k \beta_{22} u_2) = \lambda_1^{2k} \beta_{11}\beta_{12} + \lambda_2^{2k} \beta_{22} \beta_{21},$

$(\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2)^T(\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2) = \lambda_1^{2k} \beta_{11}^2 + \lambda_2^{2k}\beta_{21}^2.$

The ratio is then

$\frac{(\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2)^T(\lambda_1^k \beta_{12} u_1 + \lambda_2^k \beta_{22} u_2)}{(\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2)^T(\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2)} = \frac{ \lambda_1^{2k} \beta_{11}\beta_{12} + \lambda_2^{2k} \beta_{22} \beta_{21}} {\lambda_1^{2k} \beta_{11}^2 + \lambda_2^{2k} \beta_{22}^2} = \frac{ \beta_{11}\beta_{12} + \delta^{2k} \beta_{22} \beta_{21}} {\beta_{11}^2 + \delta^{2k} \beta_{22}^2}, ~~ \delta = \lambda_2/\lambda_1$

To see if this vector $r_2^{(k)}$ points in the direction of $u_1$ or $u_2$ we take the inner products, using orthogonality:

${u_1^Tr_2^{(k)}} = {\lambda_1^k} \beta_{12} - \frac{ \beta_{11}\beta_{12} + \delta^{2k} \beta_{22} \beta_{21}} {\beta_{11}^2 + \delta^{2k} \beta_{22}^2} {\lambda_1^k} \beta_{11} = \lambda_2^k \delta^{-k} \left( \beta_{12} - \frac{ \beta_{11}\beta_{12} + \delta^{2k} \beta_{22} \beta_{21}} {\beta_{11}^2 + \delta^{2k} \beta_{22}^2}\right).$

It can be shown that the quantity in parenthesis is $O(|\delta|^{2k})$ as $k \to \infty$ . In other words:

${u_1^Tr_2^{(k)}} = \lambda_2^k O(|\delta|^k).$

We now compare this to

${u_2^Tr_2^{(k)}} = \lambda_2^k \left( \beta_{22} - \frac{ \beta_{11}\beta_{12} + \delta^{2k} \beta_{22} \beta_{21}} {\beta_{11}^2 + \delta^{2k} \beta_{22}^2} \beta_{12} \right).$

This time the quantity in parenthesis has no reason to vanish as $k \to \infty$ . We see that (typically)

$\lim_{k \to \infty} \frac{|u_2^Tr_2^{(k)}|}{|u_1^Tr_2^{(k)}|} = \infty.$

This is a long-winded way of saying that if you apply Gram-Schmidt to the columns of $A^k$ , for a symmetric matrix $A$ , then the orthogonal matrix that results will converge to the matrix $U$ of eigenvectors of $A$ . This means that:

Theorem 1¶

Suppose $A$ is a symmetric, non-singular matrix $A$ with eigenvalues that have distinct magnitudes

$|\lambda_1| > |\lambda_2| > \cdots > |\lambda_n|.$

If $A^k = QR = Q^{(k)} R^{(k)}$ is the QR factorization of $A^k$ , then as $k \to \infty$ ,

$(Q^{(k)})^T A Q^{(k)} = (Q^{(k)})^T U D U^TQ^{(k)} \to D.$

Computational issue: We should NOT compute $A^k$ for $k$ large! This will typically result in overflow and underflow. We need another way to compute $Q^{(k)}$ . Consider the sequence:

$A = Q_1 R_1, \quad \text{ the QR factorization of $A$},$
$A_1 = R_1 Q_1, \quad \text{reverse the order of multiplication},$
$A_1 = Q_2 R_2, \quad \text{ the QR factorization of $A_1$},$
$A_2 = R_2 Q_2, \quad \text{reverse the order of multiplication},$
$A_2 = Q_3 R_3, \quad \text{ the QR factorization of $A_2$},$
$A_3 = R_3 Q_3, \quad \text{reverse the order of multiplication},$
$\vdots$

Theorem 2¶

For any $k \geq 1$ , $A_k$ has the same eigenvalues as $A$ , $Q_1 Q_{2} \cdots Q_k = Q^{(k)} \to U$ and $A_k \to D$ as $k \to \infty$ .

Proof¶

We first show that the eigenvalues never change. $A = Q_1 R_1 = Q_1 R_1 Q_1 Q_1^T = Q_1 A_1 Q_1^T$ . So, $A$ is similar to $A_1$ . In general, it follows that $A_{k-1} = Q_k A_k Q_{k}^T$ and therefore $A_k$ is similar to $A$ for every $k$ , by induction:

$A_k = Q^T_k A_{k-1} Q_{k} = (Q_1 Q_2 \cdots Q_k)^T A Q_1 Q_{2} \cdots Q_k.$

Next, we show that $Q_1 Q_{2} \cdots Q_k = Q^{(k)}$ for every $k$ . For $k = 1$ , $A = Q_1R_1$ and $A^1= Q^{(1)} R^{(1)}$ , and by the uniqueness of the QR factorization $Q_1 = Q^{(1)}$ . Now, assume $Q_{1}Q_{2} \cdots Q_{k-1} = Q^{(k-1)}$ and we show it holds for $k \to k +1$ (induction). So, we have

$A^{k-1} = Q_{1} Q_{2} \cdots Q_{k-1} R^{(k-1)} = Q^{(k-1)}R^{(k-1)},$

$A^k = A Q_{1} Q_{2} \cdots Q_{k-1} R^{(k-1)},$

$A Q_{1} Q_{2} \cdots Q_{k-1} = Q_1 Q_2 \cdots Q_{k-1} A_{k-1},$

$A^k = Q_1 Q_2 \cdots Q_{k-1} A_{k-1} R^{(k-1)},$

$A_{k-1} = Q_k R_k,$

$A^k = Q_1 Q_2 \cdots Q_{k-1} Q_k R_k R^{(k-1)}.$

By the uniqueness of the QR factorization ( $R_k R^{(k-1)}$ is upper-triangular with positive diagonal entries), we find $Q_1 Q_2 \cdots Q_{k-1} Q_k = Q^{(k)}$ . The convergence then follows from Theorem 1.

We need a convergence criteria for the QR algorithm. From the Gershgorin Circle Theorem, it follows that if the sum of the off-diagonal entries of $A_k$ in each row are each less then $\epsilon > 0$ then the eigenvalues are all known to an accuracy of $\epsilon$ .

QR eigenvalue algorithm¶

for $k = 1,2,\ldots,N_{\max}$
   Set $[Q,R] = \text{QR}(A)$ % Do this with MGS
   Set $A = RQ$
   If $\|A - \mathrm{diag}(A)\|_\infty < \mathrm{TOL}$ , output $\mathrm{diag}(A)$
endfor

A = randn(5);
A = A + A'; % Convergence is slow on some matrices
D = QRalg(A,400,10e-10)'
flipud(eigs(A))'

ans =

QR completed in 197 iterations


D =

    5.1809   -4.6179    3.3188    1.8265   -0.4698


ans =

    5.1809   -4.6179    3.3188    1.8265   -0.4698

A = randn(5);
A = A*A'; % Convergence is much faster on others
D = QRalg(A,400,10e-10)'
flipud(eigs(A))'

ans =

QR completed in 39 iterations


D =

   29.9783   16.1580    8.2269    1.2845    0.1065


ans =

   29.9783   16.1580    8.2269    1.2845    0.1065

Some final remarks on eigenvalue algorithms: