Iterative methods in linear algebra¶

We now discuss methods in linear algebra that are iterative. The methods we will discuss come from two problems:

Solving linear systems $Ax = b$
Computing eigenvalues $(A-\lambda I)v = 0$ , $v \neq 0$ .

To discuss what it means to perform linear algebraic operations in a iterative manner we need a mechanism to measure the difference between two vectors and and the difference between two matrices.

Norms of vectors and matrices¶

A norm $\|x\|$ for a vector $x \in \mathbb R^{n}$ must satisfy the following properties:

$\|x\| \geq 0$ for all $x \in \mathbb R^n$
$\|x\| = 0$ if and only if $x = 0$ ( $x$ is the zero vector)
$\|\alpha x\| = |\alpha| \|x\|$ for any $x \in \mathbb R^n, ~ \alpha \in \mathbb R$
$\|x + y\| \leq \|x\| + \|y\|$ for any $x,y \in \mathbb R^n$ (triangle inequality)

There are many different norms. One important class is the $\ell_p$ norms: For $1 \leq p < \infty$ and $x = (x_1,x_2,\ldots,x_n)^T$ define

$\begin{align} \|x\|_p = \left( \sum_{i=1}^n |x_i|^p \right)^{1/p}. \end{align}$

We will use this with $p =1,2$ and if one sends $p \to \infty$ we have

$\begin{align} \|x\|_\infty = \max_{1 \leq i \leq n} |x_i|. \end{align}$

The $\ell_2$ norm is commonly referred to as the Euclidean norm because the norm of $x -y$ for two vectors $x,y \in \mathbb R^3$ gives the straight-line distance between the two points $x$ and $y$ in three-dimensional space.

Let us now, check the four properties of a norm for the $\ell_2$ norm

$\begin{align} \|x\|_2 = \left( \sum_{i=1}^n |x_i|^2 \right)^{1/2}. \end{align}$

$\|x\| \geq 0$ for all $x \in \mathbb R^n$

This is clear from the definition.

$\|x\| = 0$ if and only if $x = 0$ ( $x$ is the zero vector)

If $x = 0$ then $\|x\|_2 = 0$ . If $\|x\|_2 = 0$ then $\sum_{i} |x_i|^2 = 0$ . This implies $|x_i| = 0$ for each $i$ and therefore $x =0$ .

$\|\alpha x\| = |\alpha| \|x\|$ for any $x \in \mathbb R^n, ~ \alpha \in \mathbb R$

The $i$ th entry of $\alpha x$ is $\alpha x_i$ and so

$\|\alpha x\|_2 = \left( \sum_{i=1}^n |\alpha x_i|^2 \right)^{1/2} = \left( |\alpha|^2 \sum_{i=1}^n |x_i|^2 \right)^{1/2} = |\alpha| \left( \sum_{i=1}^n |x_i|^2 \right)^{1/2}= |\alpha| \|x\|_2.$

Showing

$\|x + y\| \leq \|x\| + \|y\|$ (triangle inequality)

is much more involved. We need an intermediate result.

The Cauchy-Schwarz inequality¶

For $x,y \in \mathbb R^n$

$\left| x^T y \right| = \left| \sum_{i=1}^n x_i y_i\right| \leq \|x\|_2 \|y\|_2.$

Before we prove this in general, let's verify it for $\mathbb R^2$

$|x_1y_1 + x_2 y_2| \overset{\mathrm{?}}{\leq} \sqrt{x_1^2 + x_2^2} \sqrt{y_1^2 + y_2^2}$

$(x_1y_1 + x_2 y_2)^2 \overset{\mathrm{?}}{\leq} (x_1^2 + x_2^2) (y_1^2 + y_2^2)$

$x^2_1y^2_1 + x^2_2 y^2_2 + 2 x_1x_2y_1y_2\overset{\mathrm{?}}{\leq} x_1^2y_1^2 + x_2^2 y_2^2 + x_1^2y_2^2 + x_2^2 y_1^2$

$2 x_1x_2y_1y_2\overset{\mathrm{?}}{\leq} x_1^2y_2^2 + x_2^2 y_1^2$

$0\overset{\mathrm{?}}{\leq} x_1^2y_2^2 + x_2^2 y_1^2 - 2 x_1x_2y_1y_2$

$0\overset{\mathrm{?}}{\leq} (x_1y_2 - x_2 y_1)^2$

This last inequality is true, and the Cauchy-Schwarz inequality follows in $\mathbb R^2$ .

From this last calculation, it is clear that performing this type of calculation for general $n$ is going to be difficult and so, we need a new strategy. For $x,y \in \mathbb R^n$ and $\lambda \in \mathbb R$ consider the norm

$\|x-\lambda y\|_2^2 = \sum_{i=1}^n (x_i-\lambda y_i)^2 = \sum_{i=1}^n x_i^2 - 2 \lambda \sum_{i=1}^n x_i y_i + \lambda^2 \sum_{i=1}^n y_i^2 = \|x\|_2^2 - 2 \lambda x^T y + \lambda^2 \|y\|_2^2.$

Notice that the right-hand side has all the terms we encounter in the Cauchy--Schwarz inequality.

If we thing about this, just as a function of $\lambda$ , keeping $x,y \in \mathbb R^n$ fixed, we have a parabola. We look at the minimum of the parabola: If $f(\lambda) = a + b \lambda + c \lambda^2, c > 0$ then $f$ attains its minimum when $\lambda = -b/(2c)$ and $f(\lambda) \geq a - b^2/(4c)$ .

From this with $a = \|x|_2^2$ , $b = -2 x^T y$ and $c = \|y\|_2^2$ we have

$\|x-\lambda y\|_2^2 \geq \|x\|_2^2 - \frac{(x^T y)^2}{\|y\|_2^2} \geq 0.$

Note that this has to be non-negative because the minimum of a non-negative functions is also non-negative. Rearranging this last inequality, we have

$(x^Ty)^2 \leq \|x\|_2^2 \|y\|_2^2$

which is just the square of the Cauchy-Schwarz inequality.

To return to the triangle inequality, we compute (set $\lambda = -1$ in the previous calculation)

$\|x + y\|_2^2 = \|x\|_2^2 + 2 x^T y + \|y\|_2^2 \leq \|x\|_2^2 + 2 |x^T y| + \|y\|_2^2$

$\|x\|_2^2 + 2 |x^T y| + \|y\|_2^2 \leq \|x\|_2^2 + 2 \|x\|_2 \|y\|_2 + \|y\|_2^2$

by the Cauchy-Schwarz inequality. But

$\|x\|_2^2 + 2 \|x\|_2 \|y\|_2 + \|y\|_2^2 = (\|x\|_2 + \|y\|_2)^2$

and summarizing we have

$\|x + y\|_2^2 \leq (\|x\|_2 + \|y\|_2)^2.$

Upon taking a square-root we see that $\|x + y\|_2 \leq \|x\|_2 + \|y\|_2$ as desired.

This actually follows for any $\ell_p$ norm but we will not prove it here.

Distance¶

The distance between two vectors $x,y \in \mathbb R^n$ , given a norm $\|\cdot \|$ is defined to be

$\|x - y\|.$

Convergence¶

A sequence of vectors $\{x^{(k)}\}_{k=1}^\infty$ , $x^{(k)} \in \mathbb R^n$ is said to converge to $x$ with respect to the norm $\|\cdot \|$ if given any $\epsilon > 0$ there exists $N(\epsilon) > 0$ such that

$\|x^{(k)} - x\| < \epsilon, \quad \text{ for all } \quad k \geq N(\epsilon).$

Equivalently, $\lim_{k \to \infty} \|x^{(k)} - x\| = 0$ .

Theorem¶

A sequence of vectors $\{x^{(k)}\}_{k=1}^\infty$ , $x^{(k)} \in \mathbb R^n$ converge to $x$ with respect to the norm $\|\cdot \|_p$ , $1 \leq p \leq \infty$ , if and only if the components converge

$x_i^{(k)} \to x_i, \quad \text{as} \quad n \to \infty$

for all $1 \leq i \leq n$ .

Proof¶

We first prove it for $p = \infty$ . We have

$|x_i^{(k)} -x_i| \leq \max_{1 \leq i \leq n} |x_i^{(k)}-x_i| = \|x^{(k)} - x\|_\infty$

And so, convergence with respect to $\|\cdot \|_\infty$ (the right-hand side tends to zero as $k \to \infty$ ) implies that each of the individual components converge.

Now, assume that each of the individual components converge. For every $\epsilon > 0$ there exists $N_i(\epsilon)$ such that $k \geq N_{i}(\epsilon)$ implies that

$|x_i^{(k)} -x_i| < \epsilon.$

Given $\epsilon > 0$ , let $k \geq \max_{1\leq i \leq n} N_{i} (\epsilon)$ . Then

$|x_i^{(k)} -x_i| < \epsilon, \quad \text{for every } 1 \leq i \leq n$

and hence $\|x^{(k)} - x\|_\infty < \epsilon$ .

To prove the theorem for general $1 \leq p < \infty$ we show $x^{(k)}$ converges to $x$ with respect to $\|\cdot\|_\infty$ if and only if it converges to $x$ with respect to $\|\cdot \|_p$ . First, for any $x \in \mathbb R^n$ we have

$\|x\|_p = \left( \sum_{i=1}^n |x_i|^p \right)^{1/p} \leq \left( \sum_{i=1}^n \max_{1 \leq i \leq n} |x_i|^p \right)^{1/p} = \left( n \max_{1 \leq i \leq n} |x_i|^p \right)^{1/p} = n^{1/p} \|x\|_\infty.$

Replacing $x$ with $x^{(k)} - x$ we have that $\|x^{(k)} - x\|_p \leq n^{1/p} \|x^{(k)} - x\|_\infty$ . Thus convergence with respect to $\|\cdot \|_\infty$ implies convergence with respect to $\|\cdot\|_p$ .

For the reverse inequality, note that for any $1 \leq j \leq n$

$|x_j| \leq \left(\sum_{i = 1}^n |x_i|^p \right)^{1/p}$

Indeed, if $x_j = 0$ , this follows immediately. If $x_j \neq 0$ then

$\left(\sum_{i = 1}^n |x_i|^p \right)^{1/p} = |x_j| \underbrace{\left( 1 + \sum_{i \neq j} \frac{|x_i|^p}{|x_j|^p} \right)^{1/p}}_{\geq 1} \geq |x_j|$

$\|x\|_\infty = \max_{1 \leq j \leq n} |x_j| \leq \left(\sum_{i = 1}^n |x_i|^p \right)^{1/p} = \|x\|_p.$

Replacing $x$ with $x^{(k)} - x$ we have that $\|x^{(k)} - x\|_\infty \leq \|x^{(k)} - x\|_p$ . Thus convergence with respect to $\|\cdot \|_p$ implies convergence with respect to $\|\cdot\|_\infty$ .

The final logic is the following:

If a sequence converges with respect to $\|\cdot\|_p$ , it converges with respect to $\|\cdot\|_\infty$ and therefore the individual components converge.
If the individual components converge, the sequence converges with respect to $\|\cdot\|_\infty$ and it then converges with respect to $\|\cdot\|_p$ .

Matrix norms¶

We can also define norms on the set of all $n \times m$ matrices. We will just concentrate on norms for square $n \times n$ matrices. A matrix norm $\|\cdot \|$ should satisfy the following for all $n \times n$ matrices $A$ and $B$ and real numbers $\alpha$

$\|A\| \geq 0$
$\|A\| = 0$ if and only if $A = 0$ is the zero matrix
$\|\alpha A\| = |\alpha| \|A\|$
$\|A + B \| \leq \|A\| + \|B\|$
$\|AB\| \leq \|A\|\|B\|$

Note the last condition. This requires more than norms for vectors.

The distance between to matrices is then defined as $\|A - B\|$ , as in the case of vectors.

We can construct a matrix norm from a vector norm.

Theorem¶

Let $\|\cdot\|$ be a norm on vectors in $\mathbb R^n$ . Then

$\|A\| = \max_{\|x\| = 1} \|Ax\|$

gives a matrix norm.

This is called the induced matrix norm.

Proof¶

Because $\|Ax\| \geq 0$ it follows that $\|A\| \geq 0$ .
If $A = 0$ then $\|Ax\| = 0$ and $\|A\| = 0$ . If $\|A\| = 0$ then $Ae_j = 0$ . Since $Ae_j$ gives the $j$ th column of $A$ where $e_j$ is the $j$ th standard basis vector we know that every column of $A$ is zero and thus $A = 0$ .
$\|\alpha A\| = \max_{\|x\| = 1} \|\alpha A x\| = |\alpha| \max_{\|x\| = 1} \| A x\|$ from the analogous property for the vector norm.
For the triangle inequality

$\|A + B\| \leq \max_{\|x\| = 1} \|(A + B)x\| \leq \max_{\|x\| = 1} \left( \|Ax\| + \|Bx\| \right) \leq \max_{\|x\| = 1, \|y\| = 1} \left( \|Ax\| + \|By\| \right)$

$\max_{\|x\| = 1, \|y\| = 1} \left( \|Ax\| + \|By\| \right) = \max_{\|x\| = 1} \|Ax\| + \max_{\|xy\| = 1} \|By\| = \|A\| + \|B\|$

Therefore $\|A + B\| \leq \|A\| + \|B\|$ .

Let $b(x) = \|Bx\| \neq 0$ for at least one $x$ . Then $\|Bx/b(x)\| = 1$ for such an $x$ . Then

$\|AB\| = \max_{\|x\| = 1,~~ b(x) \neq 0} b(x) \|A (B x/b(x)) \| \leq \max_{\|x\| =1} \left[ \max_{\|x\| = 1} b(x) \right] \|A x\| = \|B\| \|A\|.$

If $b(x) = 0$ for all $x$ , then $B = 0$ and $AB = 0$ , and we find $\|AB\| = 0$ from the definition.

Given an norm, it is important to find a formula (if possible) for the induced matrix norm.

Theorem¶

$\|A\|_\infty = \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}|.$

The induced $\ell^\infty$ matrix norm is the "maximum absolute row sum."

Proof¶

First, recall the definition

$\|A\|_\infty = \max_{\|x\| = 1} \|A x\|_\infty.$

If $A = 0$ , the formula is correct, it gives 0. Assume $A \neq 0$ . Given a vector $x \in \mathbb R^n$ using the formula for the matrix-vector product

$\|Ax\|_\infty = \max_{1 \leq i \leq n} \left| \sum_{j=1}^n a_{ij} x_j \right|.$

Now find the first row $i$ such that

$\max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}| = \sum_{j=1}^n |a_{ij}|.$

Define the $\mathrm{sign}(x)$ function by $\mathrm{sign}(x) = 1$ if $x > 0$ , $\mathrm{sign}(x) = -1$ if $x < 0$ and $\mathrm{sign}(0) = 0$ . Choose the vector $x$ by the rule $x_j = \mathrm{sign}(a_{ij})$ , $1 \leq j\leq n$ . Then because $A \neq 0$ one such $a_{ij}$ must be nonzero and $\|x\|_\infty = 1$ . For this choice of $x$ it follows that

$\|Ax\|_\infty = \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}|,$

and therefore

$\|A\|_\infty \geq \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}|$

.

Now, we must show the reverse inequality which is much easier: For any $x \in \mathbb R^n$ , $\|x\|_\infty = 1$ , by the triangle inequality

$\|Ax\|_\infty = \max_{1 \leq i \leq n} \left| \sum_{j=1}^n a_{ij} x_j \right| \leq \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}| |x_j| \leq \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}|.$

The last inequality follows because $|x_j| \leq 1$ for each $j$ . Taking the maximum of this expression over all $\|x|_\infty = 1$ we find

$\|A\|_\infty \leq \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}|$

which shows

$\|A\|_\infty = \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}|.$

It is important to note that the matrix norm $\|A\|_\infty$ is NOT the largest entry, in absolute value, of the matrix.

Theorem¶

$\|A\|_1 = \max_{1 \leq j \leq n} \sum_{i=1}^n |a_{ij}|.$

The induced $\ell^1$ matrix norm is the "maximum absolute column sum."

Proof¶

First, recall the definition

$\|A\|_1 = \max_{\|x\| = 1} \|A x\|_1.$

Given a vector $x \in \mathbb R^n$ using the formula for the matrix-vector product

$\|Ax\|_1 = \sum_{i=1}^n \left| \sum_{j=1}^n a_{ij} x_j \right|.$

Now find the first column $j$ such that

$\max_{1 \leq j \leq n} \sum_{i=1}^n |a_{ij}| = \sum_{i=1}^n |a_{ij}|.$

Choose the vector $x$ by the rule $x_j = 1$ , $x_i = 0$ for $i \neq j$ . For this choice of $x$ it follows that

$\|Ax\|_1 = \sum_{i=1}^n |a_{ij}| = \max_{1 \leq j \leq n} \sum_{i=1}^n |a_{ij}|$

and therefore

$\|A\|_\infty \geq \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}|.$

Now, we must show the reverse inequality which is, again, much easier: For any $x \in \mathbb R^n$ , $\|x\|_1 = 1$ , by the triangle inequality

$\|Ax\|_1 = \sum_{i=1}^n \left| \sum_{j=1}^n a_{ij} x_j \right| \leq \sum_{j=1}^n \sum_{i=1}^n |a_{ij} x_j| = \sum_{j=1}^n |x_j| \sum_{i=1}^n |a_{ij}| \leq \sum_{j=1}^n |x_j| \max_{1 \leq j \leq n} \sum_{i=1}^n |a_{ij}| = \max_{1 \leq j \leq n} \sum_{i=1}^n |a_{ij}|.$

The last inequality follows because $\|x\|_1 = 1$ . Taking the maximum of this expression over all $\|x|_1 = 1$ we find

$\|A\|_1 \leq \max_{1 \leq j \leq n} \sum_{i=1}^n |a_{ij}|$

which shows

$\|A\|_1 = \max_{1 \leq j \leq n} \sum_{i=1}^n |a_{ij}|.$