Sõnastik

Valige vasakul üks märksõnadest ...

Linear AlgebraMatrices

Lugemise aeg: ~60 min

A matrix is a rectangular array of numbers:

\begin{align*}\begin{bmatrix} 4 & 2 & -3 & 1 \\ 2 & -1 & 5 & 9 \\ -2 & 8 & 6 & 2 \\ 4 & -1 & 5 & 4 \\ 2 & -1 & 2 & -5 \end{bmatrix}\end{align*}

We report the size of a matrix using the convention number of rows by number of columns. In other words, a matrix with m rows and n columns is said to be an m\times n matrix. The matrix above is by .

We refer to the entry in the i th row and j th column of a matrix A as A's (i,j)th entry, and we denote it as A_{i,j}. In Python, the (i,j)th entry may be referenced as A[i,j].

We refer to the entry in the i th row and j th column of a matrix A as A's (i,j)th entry, and we denote it as A_{i,j}. In Julia, the (i,j)th entry may be referenced as A[i,j].

Matrices are versatile structures with a variety of problem-solving uses. For example,

  • A matrix can be thought of as a list of column vectors, so we can use a matrix to package many column vectors into a single mathematical object.

  • An m\times n matrix can be thought of as a linear transformation from \mathbb{R}^n to \mathbb{R}^m.

In this section, we will develop both of these perspectives and define some operations which facilitate common manipulations that arise when handling matrices.

Definition (Matrix addition and scalar multiplication)
We define matrix addition for two m\times n matrices A and B entrywise: the sum A+B is m\times n, and each entry is defined to be the sum of the corresponding entries in A and B.

Likewise, the product of a number c and an m\times n matrix A is defined to be the m\times n matrix each of whose entries is c times the corresponding entry of A.

Exercise
Find the value of c such that

\begin{align*}\begin{bmatrix} 6 & 7 & -1 \\ 1 & 3 & 5 \end{bmatrix} + (1-c) \begin{bmatrix} 4 & -4 & 2 \\ -2 & 0 & 1 \end{bmatrix} = \begin{bmatrix} -2 & 15 & -5 \\ 5 & 3 & 3 \end{bmatrix}\end{align*}

Note that two matrices are considered equal if each pair of corresponding entries are equal.

The solution is c = .

Solution. If we look at the middle entry of the bottom row of the two sides of the equation, get

\begin{align*}3 + (1-c)0 = 3\end{align*}

We can see that this equation will hold regardless of the value of c. The equation corresponding to the top-right corner is

\begin{align*}6 + (1-c)4 = -2\end{align*}

Solving this equation, we find that c=3. Therefore, if there is a solution to the original matrix equation, it must be c=3. We can then check the remaining four equations to see that c=3 is indeed a solution.

Matrices as linear transformations

One of the most useful ways to think of a matrix is as a concrete representation of a linear transformation. The following definition provides the connection between matrices and maps between vector spaces.

Definition (Matrix-vector multiplication)
If A is an m\times n matrix and \mathbf{x} is a column vector in \mathbb{R}^n, then A\mathbf{x} is defined to be the linear combination of the columns of A with weights given by the entries of \mathbf{x}.

Example
If A = \begin{bmatrix} 1 & 1 \\\ 0 & 1 \end{bmatrix} and \mathbf{x} = \begin{bmatrix} 2 \\\ 3 \end{bmatrix}, then A\mathbf{x} = a \begin{bmatrix} 1 \\\ 0 \end{bmatrix} + b \begin{bmatrix} 1 \\\ 1 \end{bmatrix} = \begin{bmatrix} c \\\ d \end{bmatrix} where a = , b = , c = , and d = .

As advertised, the transformations described in the definition of matrix-vector multiplication are linear:

Exercise
Suppose that A is an m \times n matrix. Show that \mathbf{x} \mapsto A\mathbf{x} is a linear transformation.

Solution. Suppose A has columns \mathbf{a}_1, \cdots \mathbf{a}_n and \mathbf{x} = [x_1, \cdots, x_n]. By definition,

\begin{align*}A\mathbf{x}= x_1 \mathbf{a}_1 + \cdots + x_n \mathbf{a}_n\end{align*}

Consider a second vector \mathbf{y} = [y_1, \cdots, y_n]. We have

\begin{align*}A(\mathbf{x} + \mathbf{y}) & = (x_1 + y_1)\mathbf{a}_1 + \cdots + (x_n + y_n)\mathbf{a}_n\\ & = (x_1 \mathbf{a}_1 + \cdots + x_n \mathbf{a}_n) + (y_1 \mathbf{a}_1 + \cdots + y_n \mathbf{a}_n) \\ & = A\mathbf{x} + A\mathbf{y}.\end{align*}

Next, let c \in \mathbb{R} be a constant.

\begin{align*}A(c\mathbf{x}) &= (cx_1)\mathbf{a}_1 + \cdots + (cx_n)\mathbf{a}_n\\ &= c(x_1 \mathbf{a}_1 + \cdots + x_n \mathbf{a}_n)\\ &= c(A\mathbf{x})\end{align*}

These are the two requirements for a transformation to be considered linear, so \mathbf{x} \mapsto A\mathbf{x} is indeed linear.

It turns out that every linear transformation from \mathbb{R}^n to \mathbb{R}^m can be represented as \mathbf{x}\mapsto A\mathbf{x} for some matrix A. The entries of the matrix A may be obtained from L by placing the components of L(\mathbf{e}_1) in the first column of A, the components of L(\mathbf{e}_2) in the second column, and so on.

With this definition of A, we have that A\mathbf{e}_1 L(\mathbf{e}_1), and similarly for the other standard basis vectors. Since the equation L(\mathbf{x}) = A\mathbf{x} holds for all \mathbf{x} in a basis of \mathbb{R}^n, we conclude that it holds for all \mathbf{x} \in \mathbb{R}^n (by the basis equality theorem).

Exercise
Find the matrix corresponding to the linear transformation T([x,y,z]) = [z,x,y].

Solution. Based on the first component of the expression for T([x,y,z]), we find that the first column of the matrix representing T is [0, 1, 0]. Similarly, the next two columns are [0,0,1], and [1,0,0]. Altogether, we find that the matrix is

\begin{align*}\begin{bmatrix} 0 & 0 & 1 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \end{bmatrix}.\end{align*}

Exercise
Suppose that A is an m\times n matrix and \mathbf{b} is a vector in \mathbb{R}^m with the property that the equation A\mathbf{x} = \mathbf{b} has at least one solution \mathbf{x} \in \mathbb{R}^n. Show that the solution is unique if and only if the columns of A are linearly independent.

The intuition is that \mathbf{x} provides a recipe for how much of each column of A to use to get \mathbf{b}. If the columns of A are linearly dependent, then we can swap out a unit of one of the vectors for some combination of others. This swappability shows that the solution is nonunique.

Solution. If the columns \mathbf{a}_1, \ldots \mathbf{a}_n of A are not linearly independent, then one of the columns is a linear combination of the columns to its left, say

\begin{align*}\mathbf{a}_k = c_1\mathbf{a}_1 + \cdots + c_{k-1}\mathbf{a}_{k-1}.\end{align*}

Therefore, given any solution of A\mathbf{x} = \mathbf{b}, we can obtain another solution by increasing the $kth component of\mathbf{x}$ by and decreasing the first component by c_1, the second by c_2, and so on, up to c_{k-1}.

Conversely, if there are distinct solutions \mathbf{x}_1 and \mathbf{x}_2, then A(\mathbf{x}_2 - \mathbf{x}_1) = . Therefore, the components of \mathbf{x}_2 - \mathbf{x}_1 provide the weights for a linear combination of the of A which is equal to the zero vector.

Matrix multiplication

With the perspective that matrices should represent linear transformations, it makes sense to define matrix multiplication so that corresponds to composition of the corresponding linear transformations.

Definition (matrix multiplication)
If A is an m\times n matrix and B is an n\times p matrix, then AB is defined to be the matrix for which (AB)(\mathbf{x}) = A(B\mathbf{x}) for all \mathbf{x} \in \mathbb{R}^p.

This definition specifies the matrix product of two matrices, but it doesn't give us an algorithm for calculating it. Let's work that out in the context of a specific example:

Exercise (matrix product) Suppose that A = \begin{bmatrix} 3 & -1 & 2 \\\ 4 & 2 & 0 \end{bmatrix} and B = \begin{bmatrix} 4 & -5 & 0 & 1 \\\ 2 & 8 & 0 & 0 \\\ -1 & 5 & 3 & 2 \end{bmatrix}. Consider the matrix C defined so that, for all 1 \leq k \leq 4, the k th column of C is defined to be the product of A and the k th column of B. Show that C = AB according to the definition of matrix multiplication.

Solution. Let \mathbf{x} = [x_1,x_2,x_3,x_4] be an arbitrary vector in \mathbb{R}^4. By definition,

\begin{align*}(AB)\mathbf{x} = A(B\mathbf{x})\end{align*}

Let's compute the expression A(B\mathbf{x}) on the right-hand side. Firstly, we have

\begin{align*}B\mathbf{x} = x_1 \begin{bmatrix} 4 \\\ 2 \\\ 1 \end{bmatrix} + x_2 \begin{bmatrix} -5 \\\ 8 \\\ 5 \end{bmatrix} + x_3 \begin{bmatrix} 0 \\\ 0 \\\ 3 \end{bmatrix} + x_4 \begin{bmatrix} 1 \\\ 0 \\\ 2 \end{bmatrix}.\end{align*}

Then, by linearity, we have

\begin{align*}A(B\mathbf{x}) &= A \left(x_1 \begin{bmatrix} 4 \\\ 2 \\\ 1 \end{bmatrix} + x_2 \begin{bmatrix} -5 \\\ 8 \\\ 5 \end{bmatrix} + x_3 \begin{bmatrix} 0 \\\ 0 \\\ 3 \end{bmatrix} + x_4 \begin{bmatrix} 1 \\\ 0 \\\ 2 \end{bmatrix}\right)\\ &= x_1 A \left( \begin{bmatrix} 4 2 \\\ 1 \end{bmatrix} \right) + x_2 A \left( \begin{bmatrix} -5 \\\ 8 \\\ 5 \end{bmatrix} \right) + x_3 A \left( \begin{bmatrix} 0 \\\ 0 \\\ 3 \end{bmatrix} \right) + x_4 A \left( \begin{bmatrix} 1 \\\ 0 \\\ 2 \end{bmatrix} \right)\\ & = C\mathbf{x}\end{align*}

This demonstrates that (AB)\mathbf{x} is equal to C\mathbf{x} for the matrix C described in the problem.

The principle worked out in this exercise is general: the $k$th column of AB is the product of A and the $k$th column of B, for each column index k. In other words,

\begin{align*}AB = A[\begin{array}{cccc}\mathbf{b}_1 & \mathbf{b}_2 & \cdots & \mathbf{b}_n\end{array}] = [\begin{array}{cccc} A\mathbf{b}_1 & A\mathbf{b_2} & \cdots & A\mathbf{b}_n\end{array}],\end{align*}

where the notation B = [\begin{array}{cccc}\mathbf{b}_1 & \mathbf{b}_2 & \cdots & \mathbf{b}_n\end{array}] means that \mathbf{b}_1, \ldots, \mathbf{b}_n are the columns of B. We call this observation the product column rule.

The invertible matrix theorem

The range or null space of a matrix A is defined to be the range or null space of the corresponding linear transformation \mathbf{x}\mapsto A\mathbf{x}.

Exercise
Show that a matrix represents an injective transformation if and only if its null space is \{\boldsymbol{0}\}.

Solution. A linear transformation always maps a zero vector to the zero vector, so an linear transformation cannot map any other vector to the zero vector. Therefore, the null space of an injective transformation is the set containing only the zero vector.

If a transformation is not injective, then there are two distinct vectors \mathbf{v}_1 and \mathbf{v}_2 which map to the same vector \mathbf{b}. By linearity, the transformation maps \mathbf{v}_1 - \mathbf{v}_2 to \mathbf{b} - \mathbf{b} = \boldsymbol{0}. Since \mathbf{v}_1 - \mathbf{v}_2 is not equal to the vector, this implies that the null space contains more than just the zero vector. It follows that a matrix whose null space contains only the zero vector is indeed .

The rank of A is defined to be the dimension of its range.

Example
The matrix A = \begin{bmatrix} 0 & 1 & 0 \\\ 0 & 0 & 2 \end{bmatrix} has rank 2, because the span of its columns is all of \mathbb{R}^2. The null space has dimension 1, since the solution of A \mathbf{x} = \boldsymbol{0} is the span of \{[1,0,0]\}.

For a matrix which is square (meaning that it represents a transformation from some space \mathbb{R}^n to itself), injectivity, surjectivity, and bijectivity are all equivalent:

Theorem (Invertible matrix theorem)
Suppose that A is an n\times n matrix. Then the following are equivalent (that is, for a given matrix they are either all true or all false).

  1. The transformation \mathbf{x}\mapsto A\mathbf{x} from \mathbb{R}^n to \mathbb{R}^n is bijective.
  2. The range of A is \mathbb{R}^n.
  3. The null space of A is \{\boldsymbol{0}\}.

In other words, for a linear transformation from \mathbb{R}^n to \mathbb{R}^n, bijectivity, surjectivity, and injectivity are equivalent.

Proof. We begin by showing that (ii) and (iii) are equivalent. If the columns of A are linearly dependent, then the range of A is spanned by fewer than n vectors. Therefore, if the rank of A is equal to n, then the columns of A are linearly independent. This implies that a linear combination of the columns is equal to the zero vector only if the weights are all zero. In other words, the only solution of the equation A\mathbf{x} = \boldsymbol{0} is the zero vector. In other words, the null space of A is \{\boldsymbol{0}\}.

Conversely, if the null space of A is \{\boldsymbol{0}\}, then the columns of A are linearly , and the rank of A is therefore equal to n.

By definition of bijectivity, (ii) and (iii) together imply (i), and (i) implies (ii) and (iii). Therefore, the three statements are equivalent.

The inverse matrix

If A is invertible, then the inverse function is also a linear transformation:

Exercise
Show that if L is a bijective linear transformation, then the inverse function L^{-1} is also linear.

Solution. Consider the linearity equation L(a\mathbf{x} + b\mathbf{y}) = aL(\mathbf{x}) + bL(\mathbf{y}) and two vectors \mathbf{v} = L(\mathbf{x}) and \mathbf{w} = L(\mathbf{y}) in the range of L. Substitute \mathbf{x} = L^{-1}(\mathbf{v}) and \mathbf{x} = L^{-1}(\mathbf{v}) into the linearity equation for L to obtain

\begin{align*}L(a L^{-1}(\mathbf{v}) + b L^{-1}(\mathbf{w})) = a\mathbf{v} + b\mathbf{w},\end{align*}

which implies that

\begin{align*}a L^{-1}(\mathbf{v}) + b L^{-1}(\mathbf{w}) = L^{-1}(a\mathbf{v} + b\mathbf{w}).\end{align*}

If \mathbf{x}\mapsto A\mathbf{x} is invertible, then matrix of the inverse of \mathbf{x}\mapsto A\mathbf{x} is called the inverse of A and is denoted A^{-1}. The matrices A and A^{-1} satisfy the equations AA^{-1} = A^{-1} A = I, where I denotes the n\times n identity matrix, which has ones along the diagonal starting at the top left entry and zeros elsewhere.

Example
If A = \begin{bmatrix} 2 & 1 \\\ 1 & 1 \end{bmatrix} and B = \begin{bmatrix} 1 & -1 \\\ -1 & 2 \end{bmatrix}, then

\begin{align*}BA = \begin{bmatrix} 1 & -1 \\\ -1 & 2 \end{bmatrix} \begin{bmatrix} 2 & 1 \\\ 1 & 1 \end{bmatrix} = \begin{bmatrix} 1 & 0 \\\ 0 & 1 \end{bmatrix}.\end{align*}

Therefore B(A\mathbf{x}) = (BA)\mathbf{x} = \mathbf{x} for all \mathbf{x} \in \mathbb{R}^2. So B = A^{-1}.

Exercise
Let T : \mathbb{R}^2 \to \mathbb{R}^2 be a linear transformation defined to be a reflection across the y-axis followed by a 15-degree clockwise rotation about the origin. Which of the following maps T\left(\begin{bmatrix} 1 \\\ 0 \end{bmatrix}\right) back to \begin{bmatrix} 1 \\\ 0 \end{bmatrix}?

Reflection across the y-axis followed by a 15-degree counterclockwise rotation about the origin.
A 15-degree counterclockwise rotation about the origin followed by a reflection across the y-axis.

Use the above example to write (AB)^{-1} in terms of A and B, when A and B are invertible matrices.

Solution. The correct answer is (b). The transformation in (a) maps T\left( \begin{bmatrix} 1 \\\ 0 \end{bmatrix} \right) to \frac{1}{2} \begin{bmatrix} 1 \\\ \sqrt{3} \end{bmatrix}.

The example above illustrates geometrically that to invert a transformation represented by AB, we may multiply the inverses of the two matrices and reverse the order: (AB)^{-1} = B^{-1}A^{-1}. This is a general fact about function inverses and composition. We can see this algebraically by writing B^{-1}A^{-1}AB = BB^{-1} = I.

Exercise

Let A be a non-zero n \times n matrix whose rank is k.

  • If k = n and \mathbf{b} \in \mathbb{R}^n, explain why there exists only one vector \mathbf{x} such that A\mathbf{x} = \mathbf{b}.
  • Suppose k < n and show that there are vectors in \mathbb{R}^n for which the equation A \mathbf{x} = \mathbf{b} has no solution.
  • If \mathbf{x} \in \mathbb{R}^n and \mathbf{y} \in \mathbb{R}^n both satisfy A\mathbf{x} = \mathbf{b} and A\mathbf{y} = \mathbf{b} for some fixed vector \mathbf{b} \in \mathbb{R}^n, describe geometrically the set of points (c_1, c_2) \in \mathbb{R}^2 such that A(c_1\mathbf{x} + c_2\mathbf{y}) = \mathbf{b}.

Based on the above observations, can the equation A\mathbf{x} = \mathbf{b} have exactly 10 solutions?

Solution.

  1. If k = n, then the columns of A form a basis for \mathbb{R}^n and so the range of A is \mathbb{R}^n. Therefore the corresponding linear transformation is invertible and the only vector that satisfies A\mathbf{x} = \mathbf{b} is given by \mathbf{x} = A^{-1}\mathbf{b}.

  2. By definition, if the range of A is not all of \mathbb{R}^n, then there exists a vector \mathbf{b} in \mathbb{R}^n which is not in the range of A. In other words, there exists \mathbf{b}\in \mathbb{R}^n such that A\mathbf{x} = \mathbf{b} has no solution.

  3. From

\begin{align*}A(c_1\mathbf{x} + c_2\mathbf{y}) &= c_1A\mathbf{x} + c_2 A\mathbf{y} \\\ &= c_1\mathbf{b} + c_2\mathbf{b} \\\ &= (c_1 + c_2)\mathbf{b},\end{align*}

we see that the set of valid pairs (c_1, c_2) \in \mathbb{R}^2 is the diagonal line x+y = 1 in \mathbb{R}^2.

From (1) and (2), we see that the equation A\mathbf{x} = \mathbf{b} can have 1 or no solution. From (3), we see that if there are at least two distinct solutions, then there are in fact infinitely many solutions. So 10 is not a possibility.

Bruno
Bruno Bruno