We've discussed how to compute a factorization
A=QR,where Q is orthogonal and R is upper-triangular. For our presentation of the QR algorithm, we assume that A=AT, i.e. A is symmetric. The QR algorithm applies more generally, but we focus only on this case.
Because A is symmetric, we know that, by the Spectral Theorem, A=UDUT where D is a diagonal matrix (containing the eigenvalues) and U is an orthogonal matrix (containing the eigenvectors). We continue with the philosophy of taking powers of the matrix, to assist us in finding the eigenvalues:
Ak=UDkUT.We will consider the case where A is 2×2 first. We assume |λ1|>|λ2|. If we look at the first column of Ak, we are computing the vector
r(k)1=Ake1,e1=[10]T.This is very similar to the Power method with starting vector x(0)=e1. Indeed, let u1 and u2 be the two columns of U, so that
r(k)1=Ake1=UDkUTe1=[u1u2][λk100λk2][uT1uT2]e1,=λk1(uT1e1)u1+λk2(uT2e1)u2=λk1(uT1e1)u1(1+O(|λ2λ1|k)),k→∞.So, it follows that Ake1 aligns with the dominant eigevector u1 as k→∞. A nice way to mathematically capture this fact is:
limi.e., the projection on to the first eigenvector is asymptotically larger the projection onto the second. Typically, the same is true of A^k e_2, the second column of A^k:
A^k e_2 = U D^k U^T e_2 = \begin{bmatrix} u_1 & u_2 \end{bmatrix} \begin{bmatrix} \lambda^k_1 & 0 \\ 0 & \lambda^k_2 \end{bmatrix} \begin{bmatrix} u_1^T \\ u_2^T \end{bmatrix} e_2, = \lambda_1^k (u_1^T e_2) u_1 + \lambda_2^k (u_2^Te_2) u_2.For notational convenience, define \beta_{ij} = u_i^T e_j, all of which we assume are non-zero. To try to isolate the second eigenvector, we can try Gram-Schmidt:
r_2^{(k)} = A^ke_2 - \left( \frac{(A^ke_1)^T A^k e_2}{(A^k e_1)^T A^k e_1)} \right) A^k e_1.This creates a vector r_k that is orthogonal to the first column of A. So, we can hope that r_k will begin to look like the second eigenvector of A:
r_2^{(k} = \lambda_1^k \beta_{12} u_1 + \lambda_2^k \beta_{22} u_2 - \frac{(\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2)^T(\lambda_1^k \beta_{12} u_1 + \lambda_2^k \beta_{22} u_2)}{(\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2)^T(\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2)}(\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2)We compute the products that occur in the fraction using the rule: If u and v are orthogonal, then (\alpha u + \beta v)^T(\gamma u + \delta v) = \alpha \gamma + \beta \delta for scalars \alpha,\beta,\gamma,\delta. We find
(\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2)^T(\lambda_1^k \beta_{12} u_1 + \lambda_2^k \beta_{22} u_2) = \lambda_1^{2k} \beta_{11}\beta_{12} + \lambda_2^{2k} \beta_{22} \beta_{21}, (\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2)^T(\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2) = \lambda_1^{2k} \beta_{11}^2 + \lambda_2^{2k}\beta_{21}^2.The ratio is then
\frac{(\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2)^T(\lambda_1^k \beta_{12} u_1 + \lambda_2^k \beta_{22} u_2)}{(\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2)^T(\lambda_1^k \beta_{11} u_1 + \lambda_2^k \beta_{21} u_2)} = \frac{ \lambda_1^{2k} \beta_{11}\beta_{12} + \lambda_2^{2k} \beta_{22} \beta_{21}} {\lambda_1^{2k} \beta_{11}^2 + \lambda_2^{2k} \beta_{22}^2} = \frac{ \beta_{11}\beta_{12} + \delta^{2k} \beta_{22} \beta_{21}} {\beta_{11}^2 + \delta^{2k} \beta_{22}^2}, ~~ \delta = \lambda_2/\lambda_1To see if this vector r_2^{(k)} points in the direction of u_1 or u_2 we take the inner products, using orthogonality:
{u_1^Tr_2^{(k)}} = {\lambda_1^k} \beta_{12} - \frac{ \beta_{11}\beta_{12} + \delta^{2k} \beta_{22} \beta_{21}} {\beta_{11}^2 + \delta^{2k} \beta_{22}^2} {\lambda_1^k} \beta_{11} = \lambda_2^k \delta^{-k} \left( \beta_{12} - \frac{ \beta_{11}\beta_{12} + \delta^{2k} \beta_{22} \beta_{21}} {\beta_{11}^2 + \delta^{2k} \beta_{22}^2}\right).It can be shown that the quantity in parenthesis is O(|\delta|^{2k}) as k \to \infty. In other words:
{u_1^Tr_2^{(k)}} = \lambda_2^k O(|\delta|^k).We now compare this to
{u_2^Tr_2^{(k)}} = \lambda_2^k \left( \beta_{22} - \frac{ \beta_{11}\beta_{12} + \delta^{2k} \beta_{22} \beta_{21}} {\beta_{11}^2 + \delta^{2k} \beta_{22}^2} \beta_{12} \right).This time the quantity in parenthesis has no reason to vanish as k \to \infty. We see that (typically)
\lim_{k \to \infty} \frac{|u_2^Tr_2^{(k)}|}{|u_1^Tr_2^{(k)}|} = \infty.This is a long-winded way of saying that if you apply Gram-Schmidt to the columns of A^k, for a symmetric matrix A, then the orthogonal matrix that results will converge to the matrix U of eigenvectors of A. This means that:
Suppose A is a symmetric, non-singular matrix A with eigenvalues that have distinct magnitudes
|\lambda_1| > |\lambda_2| > \cdots > |\lambda_n|.If A^k = QR = Q^{(k)} R^{(k)} is the QR factorization of A^k, then as k \to \infty,
(Q^{(k)})^T A Q^{(k)} = (Q^{(k)})^T U D U^TQ^{(k)} \to D.Computational issue: We should NOT compute A^k for k large! This will typically result in overflow and underflow. We need another way to compute Q^{(k)}. Consider the sequence:
A = Q_1 R_1, \quad \text{ the QR factorization of $A$},
A_1 = R_1 Q_1, \quad \text{reverse the order of multiplication},
A_1 = Q_2 R_2, \quad \text{ the QR factorization of $A_1$},
A_2 = R_2 Q_2, \quad \text{reverse the order of multiplication},
A_2 = Q_3 R_3, \quad \text{ the QR factorization of $A_2$},
A_3 = R_3 Q_3, \quad \text{reverse the order of multiplication},
\vdots
For any k \geq 1, A_k has the same eigenvalues as A, Q_1 Q_{2} \cdots Q_k = Q^{(k)} \to U and A_k \to D as k \to \infty.
We first show that the eigenvalues never change. A = Q_1 R_1 = Q_1 R_1 Q_1 Q_1^T = Q_1 A_1 Q_1^T. So, A is similar to A_1. In general, it follows that A_{k-1} = Q_k A_k Q_{k}^T and therefore A_k is similar to A for every k, by induction:
A_k = Q^T_k A_{k-1} Q_{k} = (Q_1 Q_2 \cdots Q_k)^T A Q_1 Q_{2} \cdots Q_k.Next, we show that Q_1 Q_{2} \cdots Q_k = Q^{(k)} for every k. For k = 1, A = Q_1R_1 and A^1= Q^{(1)} R^{(1)}, and by the uniqueness of the QR factorization Q_1 = Q^{(1)}. Now, assume Q_{1}Q_{2} \cdots Q_{k-1} = Q^{(k-1)} and we show it holds for k \to k +1 (induction). So, we have
A^{k-1} = Q_{1} Q_{2} \cdots Q_{k-1} R^{(k-1)} = Q^{(k-1)}R^{(k-1)}, A^k = A Q_{1} Q_{2} \cdots Q_{k-1} R^{(k-1)}, A Q_{1} Q_{2} \cdots Q_{k-1} = Q_1 Q_2 \cdots Q_{k-1} A_{k-1}, A^k = Q_1 Q_2 \cdots Q_{k-1} A_{k-1} R^{(k-1)},A_{k-1} = Q_k R_k, A^k = Q_1 Q_2 \cdots Q_{k-1} Q_k R_k R^{(k-1)}.By the uniqueness of the QR factorization (R_k R^{(k-1)} is upper-triangular with positive diagonal entries), we find Q_1 Q_2 \cdots Q_{k-1} Q_k = Q^{(k)}. The convergence then follows from Theorem 1.
We need a convergence criteria for the QR algorithm. From the Gershgorin Circle Theorem, it follows that if the sum of the off-diagonal entries of A_k in each row are each less then \epsilon > 0 then the eigenvalues are all known to an accuracy of \epsilon.
for k = 1,2,\ldots,N_{\max}
Set [Q,R] = \text{QR}(A) % Do this with MGS
Set A = RQ
If \|A - \mathrm{diag}(A)\|_\infty < \mathrm{TOL}, output \mathrm{diag}(A)
endfor
A = randn(5);
A = A + A'; % Convergence is slow on some matrices
D = QRalg(A,400,10e-10)'
flipud(eigs(A))'
A = randn(5);
A = A*A'; % Convergence is much faster on others
D = QRalg(A,400,10e-10)'
flipud(eigs(A))'
Some final remarks on eigenvalue algorithms:
This is not the QR algorithm that is used in practice. Typically, a symmetric matrix is reduced to tridiagonal form using Householder transformations. Then a QR algorithm, that is mathematically equivalent to the one we discussed, is used that exploits the tridiagonal structure. This gives a much more efficient algorithm.
To see how the Power method can fail, consider the matrix
A = \begin{bmatrix} -2 & 0 \\ 0 & 2 \end{bmatrix}.
We want to compute the eigenvalues with the symmetric Power method. Choose x = x^{(0)} = [x_1, x_2]^T. It follows that
\lambda^{(k)} = \frac{x^T A^{2k-1} x}{x^T A^{2k-2} x}.
And then
x^T A^{2k-1} x = (-2)^{2k-1} x_1^2 + 2^{2k-1} x_2^2 = 2^{2k-1}(x_2^2 - x_1^2),
x^T A^{2k-2} x = (-2)^{2k-2} x_1^2 + 2^{2k-2} x_2^2 = 2^{2k-2}(x_2^2 + x_1^2).
So, we find
\lambda^{(k)} = 2\frac{x_2^2 - x_1^2}{x_2^2 + x_1^2} \neq \pm 2,
for most choices of x_1 and x_2. This is the worst case for a numerical algorithms: It converges to the wrong answer! This can be fixed. If one consider A + I = \mathrm{diag}(-1,3), the power method will converge, \lambda^{(k)} \to 3 as k \to \infty. And because we added the identity matrix, we subtract one: \lambda^{(k)} -1 \to 2, a true eigenvalue of A.
This failure of the Power method and its remedy, shows us the idea of "shifting" to knock the matrix into more convenient form where the method converges faster (or actually converges). To get the full power of the QR algorithm, one should use the so-called Wilkinson shift which greatly accelerates convegence.