Processing math: 1%

Iterative methods in linear algebra

We now discuss methods in linear algebra that are iterative. The methods we will discuss come from two problems:

  1. Solving linear systems Ax=b
  2. Computing eigenvalues (AλI)v=0, v0.

To discuss what it means to perform linear algebraic operations in a iterative manner we need a mechanism to measure the difference between two vectors and and the difference between two matrices.

Norms of vectors and matrices

A norm for a vector x \in \mathbb R^{n} must satisfy the following properties:

  • \|x\| \geq 0 for all x \in \mathbb R^n
  • \|x\| = 0 if and only if x = 0 (x is the zero vector)
  • \|\alpha x\| = |\alpha| \|x\| for any x \in \mathbb R^n, ~ \alpha \in \mathbb R
  • \|x + y\| \leq \|x\| + \|y\| for any x,y \in \mathbb R^n (triangle inequality)

There are many different norms. One important class is the \ell_p norms: For 1 \leq p < \infty and x = (x_1,x_2,\ldots,x_n)^T define

\begin{align} \|x\|_p = \left( \sum_{i=1}^n |x_i|^p \right)^{1/p}. \end{align}

We will use this with p =1,2 and if one sends p \to \infty we have

\begin{align} \|x\|_\infty = \max_{1 \leq i \leq n} |x_i|. \end{align}

The \ell_2 norm is commonly referred to as the Euclidean norm because the norm of x -y for two vectors x,y \in \mathbb R^3 gives the straight-line distance between the two points x and y in three-dimensional space.

Let us now, check the four properties of a norm for the \ell_2 norm

\begin{align} \|x\|_2 = \left( \sum_{i=1}^n |x_i|^2 \right)^{1/2}. \end{align}
  • \|x\| \geq 0 for all x \in \mathbb R^n

This is clear from the definition.

  • \|x\| = 0 if and only if x = 0 (x is the zero vector)

If x = 0 then \|x\|_2 = 0. If \|x\|_2 = 0 then \sum_{i} |x_i|^2 = 0. This implies |x_i| = 0 for each i and therefore x =0.

  • \|\alpha x\| = |\alpha| \|x\| for any x \in \mathbb R^n, ~ \alpha \in \mathbb R

The ith entry of \alpha x is \alpha x_i and so

\|\alpha x\|_2 = \left( \sum_{i=1}^n |\alpha x_i|^2 \right)^{1/2} = \left( |\alpha|^2 \sum_{i=1}^n |x_i|^2 \right)^{1/2} = |\alpha| \left( \sum_{i=1}^n |x_i|^2 \right)^{1/2}= |\alpha| \|x\|_2.

Showing

  • \|x + y\| \leq \|x\| + \|y\| (triangle inequality)

is much more involved. We need an intermediate result.

The Cauchy-Schwarz inequality

For x,y \in \mathbb R^n

\left| x^T y \right| = \left| \sum_{i=1}^n x_i y_i\right| \leq \|x\|_2 \|y\|_2.

Before we prove this in general, let's verify it for \mathbb R^2

|x_1y_1 + x_2 y_2| \overset{\mathrm{?}}{\leq} \sqrt{x_1^2 + x_2^2} \sqrt{y_1^2 + y_2^2}(x_1y_1 + x_2 y_2)^2 \overset{\mathrm{?}}{\leq} (x_1^2 + x_2^2) (y_1^2 + y_2^2)x^2_1y^2_1 + x^2_2 y^2_2 + 2 x_1x_2y_1y_2\overset{\mathrm{?}}{\leq} x_1^2y_1^2 + x_2^2 y_2^2 + x_1^2y_2^2 + x_2^2 y_1^2 2 x_1x_2y_1y_2\overset{\mathrm{?}}{\leq} x_1^2y_2^2 + x_2^2 y_1^2 0\overset{\mathrm{?}}{\leq} x_1^2y_2^2 + x_2^2 y_1^2 - 2 x_1x_2y_1y_2 0\overset{\mathrm{?}}{\leq} (x_1y_2 - x_2 y_1)^2

This last inequality is true, and the Cauchy-Schwarz inequality follows in \mathbb R^2.

From this last calculation, it is clear that performing this type of calculation for general n is going to be difficult and so, we need a new strategy. For x,y \in \mathbb R^n and \lambda \in \mathbb R consider the norm

\|x-\lambda y\|_2^2 = \sum_{i=1}^n (x_i-\lambda y_i)^2 = \sum_{i=1}^n x_i^2 - 2 \lambda \sum_{i=1}^n x_i y_i + \lambda^2 \sum_{i=1}^n y_i^2 = \|x\|_2^2 - 2 \lambda x^T y + \lambda^2 \|y\|_2^2.

Notice that the right-hand side has all the terms we encounter in the Cauchy--Schwarz inequality.

If we thing about this, just as a function of \lambda, keeping x,y \in \mathbb R^n fixed, we have a parabola. We look at the minimum of the parabola: If f(\lambda) = a + b \lambda + c \lambda^2, c > 0 then f attains its minimum when \lambda = -b/(2c) and f(\lambda) \geq a - b^2/(4c).

From this with a = \|x|_2^2, b = -2 x^T y and c = \|y\|_2^2 we have

\|x-\lambda y\|_2^2 \geq \|x\|_2^2 - \frac{(x^T y)^2}{\|y\|_2^2} \geq 0.

Note that this has to be non-negative because the minimum of a non-negative functions is also non-negative. Rearranging this last inequality, we have

(x^Ty)^2 \leq \|x\|_2^2 \|y\|_2^2

which is just the square of the Cauchy-Schwarz inequality.

To return to the triangle inequality, we compute (set \lambda = -1 in the previous calculation)

\|x + y\|_2^2 = \|x\|_2^2 + 2 x^T y + \|y\|_2^2 \leq \|x\|_2^2 + 2 |x^T y| + \|y\|_2^2\|x\|_2^2 + 2 |x^T y| + \|y\|_2^2 \leq \|x\|_2^2 + 2 \|x\|_2 \|y\|_2 + \|y\|_2^2

by the Cauchy-Schwarz inequality. But

\|x\|_2^2 + 2 \|x\|_2 \|y\|_2 + \|y\|_2^2 = (\|x\|_2 + \|y\|_2)^2

and summarizing we have

\|x + y\|_2^2 \leq (\|x\|_2 + \|y\|_2)^2.

Upon taking a square-root we see that \|x + y\|_2 \leq \|x\|_2 + \|y\|_2 as desired.

This actually follows for any \ell_p norm but we will not prove it here.

Distance

The distance between two vectors x,y \in \mathbb R^n, given a norm \|\cdot \| is defined to be

\|x - y\|.

Convergence

A sequence of vectors \{x^{(k)}\}_{k=1}^\infty, x^{(k)} \in \mathbb R^n is said to converge to x with respect to the norm \|\cdot \| if given any \epsilon > 0 there exists N(\epsilon) > 0 such that

\|x^{(k)} - x\| < \epsilon, \quad \text{ for all } \quad k \geq N(\epsilon).

Equivalently, \lim_{k \to \infty} \|x^{(k)} - x\| = 0.

Theorem

A sequence of vectors \{x^{(k)}\}_{k=1}^\infty, x^{(k)} \in \mathbb R^n converge to x with respect to the norm \|\cdot \|_p, 1 \leq p \leq \infty, if and only if the components converge

x_i^{(k)} \to x_i, \quad \text{as} \quad n \to \infty

for all 1 \leq i \leq n.

Proof

We first prove it for p = \infty. We have

|x_i^{(k)} -x_i| \leq \max_{1 \leq i \leq n} |x_i^{(k)}-x_i| = \|x^{(k)} - x\|_\infty

And so, convergence with respect to \|\cdot \|_\infty (the right-hand side tends to zero as k \to \infty) implies that each of the individual components converge.

Now, assume that each of the individual components converge. For every \epsilon > 0 there exists N_i(\epsilon) such that k \geq N_{i}(\epsilon) implies that

|x_i^{(k)} -x_i| < \epsilon.

Given \epsilon > 0, let k \geq \max_{1\leq i \leq n} N_{i} (\epsilon). Then

|x_i^{(k)} -x_i| < \epsilon, \quad \text{for every } 1 \leq i \leq n

and hence \|x^{(k)} - x\|_\infty < \epsilon.

To prove the theorem for general 1 \leq p < \infty we show x^{(k)} converges to x with respect to \|\cdot\|_\infty if and only if it converges to x with respect to \|\cdot \|_p. First, for any x \in \mathbb R^n we have

\|x\|_p = \left( \sum_{i=1}^n |x_i|^p \right)^{1/p} \leq \left( \sum_{i=1}^n \max_{1 \leq i \leq n} |x_i|^p \right)^{1/p} = \left( n \max_{1 \leq i \leq n} |x_i|^p \right)^{1/p} = n^{1/p} \|x\|_\infty.

Replacing x with x^{(k)} - x we have that \|x^{(k)} - x\|_p \leq n^{1/p} \|x^{(k)} - x\|_\infty. Thus convergence with respect to \|\cdot \|_\infty implies convergence with respect to \|\cdot\|_p.

For the reverse inequality, note that for any 1 \leq j \leq n

|x_j| \leq \left(\sum_{i = 1}^n |x_i|^p \right)^{1/p}

Indeed, if x_j = 0, this follows immediately. If x_j \neq 0 then

\left(\sum_{i = 1}^n |x_i|^p \right)^{1/p} = |x_j| \underbrace{\left( 1 + \sum_{i \neq j} \frac{|x_i|^p}{|x_j|^p} \right)^{1/p}}_{\geq 1} \geq |x_j| \|x\|_\infty = \max_{1 \leq j \leq n} |x_j| \leq \left(\sum_{i = 1}^n |x_i|^p \right)^{1/p} = \|x\|_p.

Replacing x with x^{(k)} - x we have that \|x^{(k)} - x\|_\infty \leq \|x^{(k)} - x\|_p. Thus convergence with respect to \|\cdot \|_p implies convergence with respect to \|\cdot\|_\infty.

The final logic is the following:

  • If a sequence converges with respect to \|\cdot\|_p, it converges with respect to \|\cdot\|_\infty and therefore the individual components converge.

  • If the individual components converge, the sequence converges with respect to \|\cdot\|_\infty and it then converges with respect to \|\cdot\|_p.

Matrix norms

We can also define norms on the set of all n \times m matrices. We will just concentrate on norms for square n \times n matrices. A matrix norm \|\cdot \| should satisfy the following for all n \times n matrices A and B and real numbers \alpha

  • \|A\| \geq 0
  • \|A\| = 0 if and only if A = 0 is the zero matrix
  • \|\alpha A\| = |\alpha| \|A\|
  • \|A + B \| \leq \|A\| + \|B\|
  • \|AB\| \leq \|A\|\|B\|

Note the last condition. This requires more than norms for vectors.

The distance between to matrices is then defined as \|A - B\|, as in the case of vectors.

We can construct a matrix norm from a vector norm.

Theorem

Let \|\cdot\| be a norm on vectors in \mathbb R^n. Then

\|A\| = \max_{\|x\| = 1} \|Ax\|

gives a matrix norm.

This is called the induced matrix norm.

Proof

  • Because \|Ax\| \geq 0 it follows that \|A\| \geq 0.
  • If A = 0 then \|Ax\| = 0 and \|A\| = 0. If \|A\| = 0 then Ae_j = 0. Since Ae_j gives the jth column of A where e_j is the jth standard basis vector we know that every column of A is zero and thus A = 0.
  • \|\alpha A\| = \max_{\|x\| = 1} \|\alpha A x\| = |\alpha| \max_{\|x\| = 1} \| A x\| from the analogous property for the vector norm.
  • For the triangle inequality
\|A + B\| \leq \max_{\|x\| = 1} \|(A + B)x\| \leq \max_{\|x\| = 1} \left( \|Ax\| + \|Bx\| \right) \leq \max_{\|x\| = 1, \|y\| = 1} \left( \|Ax\| + \|By\| \right) \max_{\|x\| = 1, \|y\| = 1} \left( \|Ax\| + \|By\| \right) = \max_{\|x\| = 1} \|Ax\| + \max_{\|xy\| = 1} \|By\| = \|A\| + \|B\|

Therefore \|A + B\| \leq \|A\| + \|B\|.

  • Let b(x) = \|Bx\| \neq 0 for at least one x. Then \|Bx/b(x)\| = 1 for such an x. Then

    \|AB\| = \max_{\|x\| = 1,~~ b(x) \neq 0} b(x) \|A (B x/b(x)) \| \leq \max_{\|x\| =1} \left[ \max_{\|x\| = 1} b(x) \right] \|A x\| = \|B\| \|A\|.

    If b(x) = 0 for all x, then B = 0 and AB = 0, and we find \|AB\| = 0 from the definition.

Given an norm, it is important to find a formula (if possible) for the induced matrix norm.

Theorem

\|A\|_\infty = \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}|.

The induced \ell^\infty matrix norm is the "maximum absolute row sum."

Proof

First, recall the definition

\|A\|_\infty = \max_{\|x\| = 1} \|A x\|_\infty.

If A = 0, the formula is correct, it gives 0. Assume A \neq 0. Given a vector x \in \mathbb R^n using the formula for the matrix-vector product

\|Ax\|_\infty = \max_{1 \leq i \leq n} \left| \sum_{j=1}^n a_{ij} x_j \right|.

Now find the first row i such that

\max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}| = \sum_{j=1}^n |a_{ij}|.

Define the \mathrm{sign}(x) function by \mathrm{sign}(x) = 1 if x > 0, \mathrm{sign}(x) = -1 if x < 0 and \mathrm{sign}(0) = 0. Choose the vector x by the rule x_j = \mathrm{sign}(a_{ij}), 1 \leq j\leq n. Then because A \neq 0 one such a_{ij} must be nonzero and \|x\|_\infty = 1. For this choice of x it follows that

\|Ax\|_\infty = \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}|,

and therefore

\|A\|_\infty \geq \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}|

.

Now, we must show the reverse inequality which is much easier: For any x \in \mathbb R^n, \|x\|_\infty = 1, by the triangle inequality

\|Ax\|_\infty = \max_{1 \leq i \leq n} \left| \sum_{j=1}^n a_{ij} x_j \right| \leq \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}| |x_j| \leq \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}|.

The last inequality follows because |x_j| \leq 1 for each j. Taking the maximum of this expression over all \|x|_\infty = 1 we find

\|A\|_\infty \leq \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}|

which shows

\|A\|_\infty = \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}|.

It is important to note that the matrix norm \|A\|_\infty is NOT the largest entry, in absolute value, of the matrix.

Theorem

\|A\|_1 = \max_{1 \leq j \leq n} \sum_{i=1}^n |a_{ij}|.

The induced \ell^1 matrix norm is the "maximum absolute column sum."

Proof

First, recall the definition

\|A\|_1 = \max_{\|x\| = 1} \|A x\|_1.

Given a vector x \in \mathbb R^n using the formula for the matrix-vector product

\|Ax\|_1 = \sum_{i=1}^n \left| \sum_{j=1}^n a_{ij} x_j \right|.

Now find the first column j such that

\max_{1 \leq j \leq n} \sum_{i=1}^n |a_{ij}| = \sum_{i=1}^n |a_{ij}|.

Choose the vector x by the rule x_j = 1, x_i = 0 for i \neq j. For this choice of x it follows that

\|Ax\|_1 = \sum_{i=1}^n |a_{ij}| = \max_{1 \leq j \leq n} \sum_{i=1}^n |a_{ij}|

and therefore

\|A\|_\infty \geq \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}|.

Now, we must show the reverse inequality which is, again, much easier: For any x \in \mathbb R^n, \|x\|_1 = 1, by the triangle inequality

\|Ax\|_1 = \sum_{i=1}^n \left| \sum_{j=1}^n a_{ij} x_j \right| \leq \sum_{j=1}^n \sum_{i=1}^n |a_{ij} x_j| = \sum_{j=1}^n |x_j| \sum_{i=1}^n |a_{ij}| \leq \sum_{j=1}^n |x_j| \max_{1 \leq j \leq n} \sum_{i=1}^n |a_{ij}| = \max_{1 \leq j \leq n} \sum_{i=1}^n |a_{ij}|.

The last inequality follows because \|x\|_1 = 1. Taking the maximum of this expression over all \|x|_1 = 1 we find

\|A\|_1 \leq \max_{1 \leq j \leq n} \sum_{i=1}^n |a_{ij}|

which shows

\|A\|_1 = \max_{1 \leq j \leq n} \sum_{i=1}^n |a_{ij}|.