We now discuss methods in linear algebra that are iterative. The methods we will discuss come from two problems:
To discuss what it means to perform linear algebraic operations in a iterative manner we need a mechanism to measure the difference between two vectors and and the difference between two matrices.
A norm ‖ for a vector x \in \mathbb R^{n} must satisfy the following properties:
There are many different norms. One important class is the \ell_p norms: For 1 \leq p < \infty and x = (x_1,x_2,\ldots,x_n)^T define
\begin{align} \|x\|_p = \left( \sum_{i=1}^n |x_i|^p \right)^{1/p}. \end{align}We will use this with p =1,2 and if one sends p \to \infty we have
\begin{align} \|x\|_\infty = \max_{1 \leq i \leq n} |x_i|. \end{align}The \ell_2 norm is commonly referred to as the Euclidean norm because the norm of x -y for two vectors x,y \in \mathbb R^3 gives the straight-line distance between the two points x and y in three-dimensional space.
Let us now, check the four properties of a norm for the \ell_2 norm
\begin{align} \|x\|_2 = \left( \sum_{i=1}^n |x_i|^2 \right)^{1/2}. \end{align}This is clear from the definition.
If x = 0 then \|x\|_2 = 0. If \|x\|_2 = 0 then \sum_{i} |x_i|^2 = 0. This implies |x_i| = 0 for each i and therefore x =0.
The ith entry of \alpha x is \alpha x_i and so
\|\alpha x\|_2 = \left( \sum_{i=1}^n |\alpha x_i|^2 \right)^{1/2} = \left( |\alpha|^2 \sum_{i=1}^n |x_i|^2 \right)^{1/2} = |\alpha| \left( \sum_{i=1}^n |x_i|^2 \right)^{1/2}= |\alpha| \|x\|_2.Showing
is much more involved. We need an intermediate result.
For x,y \in \mathbb R^n
\left| x^T y \right| = \left| \sum_{i=1}^n x_i y_i\right| \leq \|x\|_2 \|y\|_2.Before we prove this in general, let's verify it for \mathbb R^2
|x_1y_1 + x_2 y_2| \overset{\mathrm{?}}{\leq} \sqrt{x_1^2 + x_2^2} \sqrt{y_1^2 + y_2^2}(x_1y_1 + x_2 y_2)^2 \overset{\mathrm{?}}{\leq} (x_1^2 + x_2^2) (y_1^2 + y_2^2)x^2_1y^2_1 + x^2_2 y^2_2 + 2 x_1x_2y_1y_2\overset{\mathrm{?}}{\leq} x_1^2y_1^2 + x_2^2 y_2^2 + x_1^2y_2^2 + x_2^2 y_1^2 2 x_1x_2y_1y_2\overset{\mathrm{?}}{\leq} x_1^2y_2^2 + x_2^2 y_1^2 0\overset{\mathrm{?}}{\leq} x_1^2y_2^2 + x_2^2 y_1^2 - 2 x_1x_2y_1y_2 0\overset{\mathrm{?}}{\leq} (x_1y_2 - x_2 y_1)^2This last inequality is true, and the Cauchy-Schwarz inequality follows in \mathbb R^2.
From this last calculation, it is clear that performing this type of calculation for general n is going to be difficult and so, we need a new strategy. For x,y \in \mathbb R^n and \lambda \in \mathbb R consider the norm
\|x-\lambda y\|_2^2 = \sum_{i=1}^n (x_i-\lambda y_i)^2 = \sum_{i=1}^n x_i^2 - 2 \lambda \sum_{i=1}^n x_i y_i + \lambda^2 \sum_{i=1}^n y_i^2 = \|x\|_2^2 - 2 \lambda x^T y + \lambda^2 \|y\|_2^2.Notice that the right-hand side has all the terms we encounter in the Cauchy--Schwarz inequality.
If we thing about this, just as a function of \lambda, keeping x,y \in \mathbb R^n fixed, we have a parabola. We look at the minimum of the parabola: If f(\lambda) = a + b \lambda + c \lambda^2, c > 0 then f attains its minimum when \lambda = -b/(2c) and f(\lambda) \geq a - b^2/(4c).
From this with a = \|x|_2^2, b = -2 x^T y and c = \|y\|_2^2 we have
\|x-\lambda y\|_2^2 \geq \|x\|_2^2 - \frac{(x^T y)^2}{\|y\|_2^2} \geq 0.Note that this has to be non-negative because the minimum of a non-negative functions is also non-negative. Rearranging this last inequality, we have
(x^Ty)^2 \leq \|x\|_2^2 \|y\|_2^2which is just the square of the Cauchy-Schwarz inequality.
To return to the triangle inequality, we compute (set \lambda = -1 in the previous calculation)
\|x + y\|_2^2 = \|x\|_2^2 + 2 x^T y + \|y\|_2^2 \leq \|x\|_2^2 + 2 |x^T y| + \|y\|_2^2\|x\|_2^2 + 2 |x^T y| + \|y\|_2^2 \leq \|x\|_2^2 + 2 \|x\|_2 \|y\|_2 + \|y\|_2^2by the Cauchy-Schwarz inequality. But
\|x\|_2^2 + 2 \|x\|_2 \|y\|_2 + \|y\|_2^2 = (\|x\|_2 + \|y\|_2)^2and summarizing we have
\|x + y\|_2^2 \leq (\|x\|_2 + \|y\|_2)^2.Upon taking a square-root we see that \|x + y\|_2 \leq \|x\|_2 + \|y\|_2 as desired.
This actually follows for any \ell_p norm but we will not prove it here.
The distance between two vectors x,y \in \mathbb R^n, given a norm \|\cdot \| is defined to be
\|x - y\|.A sequence of vectors \{x^{(k)}\}_{k=1}^\infty, x^{(k)} \in \mathbb R^n is said to converge to x with respect to the norm \|\cdot \| if given any \epsilon > 0 there exists N(\epsilon) > 0 such that
\|x^{(k)} - x\| < \epsilon, \quad \text{ for all } \quad k \geq N(\epsilon).Equivalently, \lim_{k \to \infty} \|x^{(k)} - x\| = 0.
A sequence of vectors \{x^{(k)}\}_{k=1}^\infty, x^{(k)} \in \mathbb R^n converge to x with respect to the norm \|\cdot \|_p, 1 \leq p \leq \infty, if and only if the components converge
x_i^{(k)} \to x_i, \quad \text{as} \quad n \to \inftyfor all 1 \leq i \leq n.
We first prove it for p = \infty. We have
|x_i^{(k)} -x_i| \leq \max_{1 \leq i \leq n} |x_i^{(k)}-x_i| = \|x^{(k)} - x\|_\inftyAnd so, convergence with respect to \|\cdot \|_\infty (the right-hand side tends to zero as k \to \infty) implies that each of the individual components converge.
Now, assume that each of the individual components converge. For every \epsilon > 0 there exists N_i(\epsilon) such that k \geq N_{i}(\epsilon) implies that
|x_i^{(k)} -x_i| < \epsilon.Given \epsilon > 0, let k \geq \max_{1\leq i \leq n} N_{i} (\epsilon). Then
|x_i^{(k)} -x_i| < \epsilon, \quad \text{for every } 1 \leq i \leq nand hence \|x^{(k)} - x\|_\infty < \epsilon.
To prove the theorem for general 1 \leq p < \infty we show x^{(k)} converges to x with respect to \|\cdot\|_\infty if and only if it converges to x with respect to \|\cdot \|_p. First, for any x \in \mathbb R^n we have
\|x\|_p = \left( \sum_{i=1}^n |x_i|^p \right)^{1/p} \leq \left( \sum_{i=1}^n \max_{1 \leq i \leq n} |x_i|^p \right)^{1/p} = \left( n \max_{1 \leq i \leq n} |x_i|^p \right)^{1/p} = n^{1/p} \|x\|_\infty.Replacing x with x^{(k)} - x we have that \|x^{(k)} - x\|_p \leq n^{1/p} \|x^{(k)} - x\|_\infty. Thus convergence with respect to \|\cdot \|_\infty implies convergence with respect to \|\cdot\|_p.
For the reverse inequality, note that for any 1 \leq j \leq n
|x_j| \leq \left(\sum_{i = 1}^n |x_i|^p \right)^{1/p}Indeed, if x_j = 0, this follows immediately. If x_j \neq 0 then
\left(\sum_{i = 1}^n |x_i|^p \right)^{1/p} = |x_j| \underbrace{\left( 1 + \sum_{i \neq j} \frac{|x_i|^p}{|x_j|^p} \right)^{1/p}}_{\geq 1} \geq |x_j| \|x\|_\infty = \max_{1 \leq j \leq n} |x_j| \leq \left(\sum_{i = 1}^n |x_i|^p \right)^{1/p} = \|x\|_p.Replacing x with x^{(k)} - x we have that \|x^{(k)} - x\|_\infty \leq \|x^{(k)} - x\|_p. Thus convergence with respect to \|\cdot \|_p implies convergence with respect to \|\cdot\|_\infty.
The final logic is the following:
If a sequence converges with respect to \|\cdot\|_p, it converges with respect to \|\cdot\|_\infty and therefore the individual components converge.
If the individual components converge, the sequence converges with respect to \|\cdot\|_\infty and it then converges with respect to \|\cdot\|_p.
We can also define norms on the set of all n \times m matrices. We will just concentrate on norms for square n \times n matrices. A matrix norm \|\cdot \| should satisfy the following for all n \times n matrices A and B and real numbers \alpha
Note the last condition. This requires more than norms for vectors.
The distance between to matrices is then defined as \|A - B\|, as in the case of vectors.
We can construct a matrix norm from a vector norm.
Let \|\cdot\| be a norm on vectors in \mathbb R^n. Then
\|A\| = \max_{\|x\| = 1} \|Ax\|gives a matrix norm.
This is called the induced matrix norm.
Therefore \|A + B\| \leq \|A\| + \|B\|.
Let b(x) = \|Bx\| \neq 0 for at least one x. Then \|Bx/b(x)\| = 1 for such an x. Then
\|AB\| = \max_{\|x\| = 1,~~ b(x) \neq 0} b(x) \|A (B x/b(x)) \| \leq \max_{\|x\| =1} \left[ \max_{\|x\| = 1} b(x) \right] \|A x\| = \|B\| \|A\|.
If b(x) = 0 for all x, then B = 0 and AB = 0, and we find \|AB\| = 0 from the definition.
Given an norm, it is important to find a formula (if possible) for the induced matrix norm.
The induced \ell^\infty matrix norm is the "maximum absolute row sum."
First, recall the definition
\|A\|_\infty = \max_{\|x\| = 1} \|A x\|_\infty.If A = 0, the formula is correct, it gives 0. Assume A \neq 0. Given a vector x \in \mathbb R^n using the formula for the matrix-vector product
\|Ax\|_\infty = \max_{1 \leq i \leq n} \left| \sum_{j=1}^n a_{ij} x_j \right|.Now find the first row i such that
\max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}| = \sum_{j=1}^n |a_{ij}|.Define the \mathrm{sign}(x) function by \mathrm{sign}(x) = 1 if x > 0, \mathrm{sign}(x) = -1 if x < 0 and \mathrm{sign}(0) = 0. Choose the vector x by the rule x_j = \mathrm{sign}(a_{ij}), 1 \leq j\leq n. Then because A \neq 0 one such a_{ij} must be nonzero and \|x\|_\infty = 1. For this choice of x it follows that
\|Ax\|_\infty = \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}|,and therefore
\|A\|_\infty \geq \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}|.
Now, we must show the reverse inequality which is much easier: For any x \in \mathbb R^n, \|x\|_\infty = 1, by the triangle inequality
\|Ax\|_\infty = \max_{1 \leq i \leq n} \left| \sum_{j=1}^n a_{ij} x_j \right| \leq \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}| |x_j| \leq \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}|.The last inequality follows because |x_j| \leq 1 for each j. Taking the maximum of this expression over all \|x|_\infty = 1 we find
\|A\|_\infty \leq \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}|which shows
\|A\|_\infty = \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}|.It is important to note that the matrix norm \|A\|_\infty is NOT the largest entry, in absolute value, of the matrix.
The induced \ell^1 matrix norm is the "maximum absolute column sum."
First, recall the definition
\|A\|_1 = \max_{\|x\| = 1} \|A x\|_1.Given a vector x \in \mathbb R^n using the formula for the matrix-vector product
\|Ax\|_1 = \sum_{i=1}^n \left| \sum_{j=1}^n a_{ij} x_j \right|.Now find the first column j such that
\max_{1 \leq j \leq n} \sum_{i=1}^n |a_{ij}| = \sum_{i=1}^n |a_{ij}|.Choose the vector x by the rule x_j = 1, x_i = 0 for i \neq j. For this choice of x it follows that
\|Ax\|_1 = \sum_{i=1}^n |a_{ij}| = \max_{1 \leq j \leq n} \sum_{i=1}^n |a_{ij}|and therefore
\|A\|_\infty \geq \max_{1 \leq i \leq n} \sum_{j=1}^n |a_{ij}|.Now, we must show the reverse inequality which is, again, much easier: For any x \in \mathbb R^n, \|x\|_1 = 1, by the triangle inequality
\|Ax\|_1 = \sum_{i=1}^n \left| \sum_{j=1}^n a_{ij} x_j \right| \leq \sum_{j=1}^n \sum_{i=1}^n |a_{ij} x_j| = \sum_{j=1}^n |x_j| \sum_{i=1}^n |a_{ij}| \leq \sum_{j=1}^n |x_j| \max_{1 \leq j \leq n} \sum_{i=1}^n |a_{ij}| = \max_{1 \leq j \leq n} \sum_{i=1}^n |a_{ij}|.The last inequality follows because \|x\|_1 = 1. Taking the maximum of this expression over all \|x|_1 = 1 we find
\|A\|_1 \leq \max_{1 \leq j \leq n} \sum_{i=1}^n |a_{ij}|which shows
\|A\|_1 = \max_{1 \leq j \leq n} \sum_{i=1}^n |a_{ij}|.