A Overview of matrix facts

In this chapter we provide an overview of vectors, matrices, and matrix operations that are useful for data modeling (though their application will not be discussed here).

A matrix is a two-dimensional array of values, symbols, or other objects (depending on the context). We will assume that our matrices contain numbers or random variables. Context will make it clear which is being represented.

A.1 Notation

Matrices are commonly denoted by bold capital letters like \(\mathbf{A}\) or \(\mathbf{B}\), but this will sometimes be simplified to capital letters like \(A\) or \(B\). A matrix \(\mathbf{A}\) with \(m\) rows and \(n\) columns (an \(m\times n\) matrix) will be denoted as \[\mathbf{A} = \begin{bmatrix} \mathbf{A}_{1,1} & \mathbf{A}_{2,1} & \cdots & \mathbf{A}_{1,n} \\ \mathbf{A}_{2,1} & \mathbf{A}_{2,1} & \cdots & \mathbf{A}_{2,n} \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{A}_{m,1} & \mathbf{A}_{m,2} & \cdots & \mathbf{A}_{m,n} \\ \end{bmatrix}, \]

where \(\mathbf{A}_{i,j}\) denotes the element in row \(i\) and column \(j\) of matrix \(\mathbf{A}\).

A column vector is a matrix with a single column. A row vector is a matrix with a single row.

  • Vectors are commonly denoted with bold lowercase letters such as \(\mathbf{a}\) or \(\mathbf{b}\), but this may be simplified to lowercase letters such as \(a\) or \(b\).

A \(p\times 1\) column vector \(\mathbf{a}\) may constructed as \[ \mathbf{a} = [a_1, a_2, \ldots, a_p] = \begin{bmatrix} a_1 \\ a_2 \\ \vdots \\ a_p \end{bmatrix}. \]

A vector is assumed to be a column vector unless otherwise indicated.

A.2 Basic mathematical operations

A.2.1 Addition and subtraction

Consider matrices \(\mathbf{A}\) and \(\mathbf{B}\) with identical sizes \(m\times n\).

We add \(\mathbf{A}\) and \(\mathbf{B}\) by adding the element in position \(i,j\) of \(\mathbf{B}\) with the element in position \(i,j\) of \(A\), i.e.,

\[(\mathbf{A} + \mathbf{B})_{i,j} = \mathbf{A}_{i,j} + \mathbf{B}_{i,j}.\]

Similarly, if we subtract \(\mathbf{B}\) from matrix \(\mathbf{A}\), then we subtract the element in position \(i,j\) of \(\mathbf{B}\) from the element in position \(i,j\) of \(\mathbf{A}\), i.e.,

\[(\mathbf{A} - \mathbf{B})_{i,j} = \mathbf{A}_{i,j} - \mathbf{B}_{i,j}.\]

Example:

\[ \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ \end{bmatrix} + \begin{bmatrix} 2 & 9 & 1 \\ 1 & 3 & 1 \\ \end{bmatrix} = \begin{bmatrix} 3 & 11 & 4 \\ 5 & 8 & 7 \\ \end{bmatrix}. \]

\[ \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ \end{bmatrix} - \begin{bmatrix} 2 & 9 & 1 \\ 1 & 3 & 1 \\ \end{bmatrix} = \begin{bmatrix} -1 & -7 & 2 \\ 3 & 2 & 5 \\ \end{bmatrix}. \]

A.2.2 Scalar multiplication

A matrix multiplied by a scalar value \(c\in\mathbb{R}\) is the matrix obtained by multiplying each element of the matrix by \(c\). If \(\mathbf{A}\) is a matrix and \(c\in \mathbb{R}\), then \[(c\mathbf{A})_{i,j} = c\mathbf{A}_{i,j}.\] Example: \[ 3\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ \end{bmatrix}= \begin{bmatrix} 3\cdot 1 & 3\cdot 2 & 3\cdot 3 \\ 3\cdot 4 & 3\cdot 5 & 3\cdot 6 \\ \end{bmatrix}= \begin{bmatrix} 3 & 6 & 9 \\ 12 & 15 & 18 \\ \end{bmatrix}. \]

A.2.3 Matrix multiplication

Consider two matrices \(\mathbf{A}\) and \(\mathbf{B}\). The matrix product \(\mathbf{AB}\) is only defined if the number of columns in \(\mathbf{A}\) matches the number of rows in \(\mathbf{B}\).

Assume \(\mathbf{A}\) is an \(m\times n\) matrix and \(\mathbf{B}\) is an \(n\times p\) matrix. \(\mathbf{AB}\) will be an \(m\times p\) matrix and \[(\mathbf{AB})_{i,j} = \sum_{k=1}^{n} \mathbf{A}_{i,k}\mathbf{B}_{k,j}.\]

Example: \[ \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \begin{bmatrix} 1 & 4\\ 2 & 5\\ 3 & 6 \end{bmatrix}= \begin{bmatrix} 1\cdot 1 + 2 \cdot 2 + 3 \cdot 3 & 1 \cdot 4 + 2 \cdot 5 + 3 \cdot 6\\ 4\cdot 1 + 5 \cdot 2 + 6 \cdot 3 & 4 \cdot 4 + 5 \cdot 5 + 6 \cdot 6\\ \end{bmatrix}= \begin{bmatrix} 14 & 32\\ 32 & 77\\ \end{bmatrix}. \]

A.2.4 Transpose

The transpose of a matrix \(\mathbf{A}\), denoted \(\mathbf{A}^T\), exchanges the rows and columns of the matrix. More formally, the \(i,j\) element of \(\mathbf{A}^T\) is the \(j,i\) element of \(\mathbf{A}\), i.e., \((\mathbf{A}^T)_{i,j} = \mathbf{A}_{j,i}\).

Example: \[ \begin{bmatrix} 2 & 9 & 3 \\ 4 & 5 & 6 \end{bmatrix}^T = \begin{bmatrix} 2 & 4\\ 9 & 5\\ 3 & 6 \end{bmatrix}. \]

A.3 Basic mathematical properties

A.3.1 Associative property

Addition and multiplication satisfy the associative property for matrices. Assuming that the matrices \(\mathbf{A}\), \(\mathbf{B}\), and \(\mathbf{C}\) have the sizes required to do the operations below, then

\[(\mathbf{A} + \mathbf{B}) + \mathbf{C} = \mathbf{A} + (\mathbf{B} + \mathbf{C})\] and \[(\mathbf{AB})\mathbf{C}=\mathbf{A}(\mathbf{BC}).\]

A.3.2 Distributive property

Matrix operations satisfy the distributive property. Assuming that the matrices \(\mathbf{A}\), \(\mathbf{B}\), and \(\mathbf{C}\) have the sizes required to do the operations below, then

\[\mathbf{A}(\mathbf{B}+\mathbf{C})=\mathbf{AB} + \mathbf{AC}\quad\mathrm{and}\quad (\mathbf{A}+\mathbf{B})\mathbf{C} = \mathbf{AC} + \mathbf{BC}.\]

A.3.3 No commutative property

In general, matrix multiplication does not satisfy the commutative property, i.e., \[\mathbf{AB} \neq \mathbf{BA},\] even when the matrix sizes allow the operation to be performed.

Example:

\[ \begin{bmatrix} 1 & 2 \end{bmatrix} \begin{bmatrix} 1\\ 2 \end{bmatrix} = \begin{bmatrix} 5 \end{bmatrix} \] while \[ \begin{bmatrix} 1\\ 2 \end{bmatrix} \begin{bmatrix} 1 & 2 \end{bmatrix} = \begin{bmatrix} 1 & 2\\ 2 & 4 \end{bmatrix}. \]

A.4 Special matrices

A.4.1 Square matrices

A matrix is square if the number of rows equals the number of columns.

The diagonal elements of an \(n\times n\) square matrix \(\mathbf{A}\) are the elements \(\mathbf{A}_{i,i}\) for \(i = 1, 2, \ldots, n\). Any non-diagonal elements of \(\mathbf{A}\) are called off-diagonal elements.

A.4.2 Identity matrix

The \(n\times n\) identity matrix \(\mathbf{I}_{n\times n}\) is 1 for its diagonal elements and 0 for its off-diagonal elements. Context often makes it clear what the dimensions of an identity matrix are, so \(\mathbf{I}_{n\times n}\) is often simplified to \(\mathbf{I}\) or \(I\).

Example:

\[ \mathbf{I}_{3\times 3} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}. \]

A.4.3 Diagonal matrices

A square matrix \(\mathbf{A}\) is diagonal if all its off-diagonal elements are zero. A \(3\times 3\) diagonal matrix and will look something like \[ \begin{bmatrix} 1 & 0 & 0\\ 0 & -1 & 0\\ 0 & 0 & 5 \end{bmatrix}, \] where the non-zero values can be replaced by any real numbers.

A.4.4 Symmetric matrices

A matrix \(\mathbf{A}\) is symmetric if \(\mathbf{A} = \mathbf{A}^T\), i.e., \(\mathbf{A}_{i,j} = \mathbf{A}_{j,i}\) for all potential \(i,j\).

A symmetric matrix must be square.

A.4.5 Idempotent matrices

A matrix \(\mathbf{A}\) is idempotent if \(\mathbf{AA} = \mathbf{A}\).

An idempotent matrix must be square.

A.4.6 Positive definite matrices

A matrix \(\mathbf{A}\) is positive definite if \[\mathbf{a}^T \mathbf{Aa} > 0,\] for every vector of real values \(\mathbf{a}\) whose values are not identically 0.

A.4.7 Inverse matrix

An \(n\times n\) matrix \(\mathbf{A}\) is invertible if there exists a matrix \(\mathbf{B}\) such that \(\mathbf{AB}=\mathbf{BA}=\mathbf{I}_{n\times n}\). The inverse of \(\mathbf{A}\) is denoted \(\mathbf{A}^{-1}\).

Inverse matrices only exist for square matrices.

Some other properties related to the inverse operator:

  • If \(n\times n\) matrices \(\mathbf{A}\) and \(\mathbf{B}\) are invertible then \((\mathbf{AB})^{-1} = \mathbf{B}^{-1} \mathbf{A} ^{-1}\).
  • If \(\mathbf{A}\) is invertible then \((\mathbf{A}^{-1})^T = (\mathbf{A}^T)^{-1}\).

A.5 Matrix derivatives

We start with some basic calculus results.

Let \(f(y)\) be a function of a scalar value \(b\) and \(\frac{df(y)}{dy}\) denote the derivative of the function with respect to \(y\). Assume \(c \in \mathbb{R}\) is a constant. Then the results in Table A.1 are true.

Table A.1: Some basic calculus results for scalar functions taking scalar inputs.
\(f(y)\) \(\frac{df(y)}{dy}\)
\(cy\) \(c\)
\(y^2\) \(2y\)
\(c y^2\) \(2cy\)

Now let’s look at the derivative of a scalar function \(f\) with respect to a vector (i.e., the function takes a vector of values and produces a single real number).

Let \(f(\mathbf{y})\) be a function of a \(p\times 1\) column vector \(\mathbf{y}=[y_1, y_2, \ldots, y_p]^T\). The derivative of \(f(\mathbf{y})\) with respect to \(\mathbf{y}\) is denoted \(\frac{\partial f(\mathbf{y})}{\partial \mathbf{y}}\) and \[ \frac{\partial f(\mathbf{y})}{\partial \mathbf{y}} = \begin{bmatrix} \frac{\partial f(\mathbf{y})}{\partial y_1}\\ \frac{\partial f(\mathbf{y})}{\partial y_2}\\ \vdots \\ \frac{\partial f(\mathbf{y})}{\partial y_p} \end{bmatrix}. \]

In words, the derivative of a scalar function with respect to its input vector is the vector of partial derivatives with respect the elements of the input vector.

Assume \(\mathbf{A}\) is an \(m\times p\) matrix of constant values. Then the results in Table A.2 are true.

Table A.2: Some basic calculus results for scalar functions taking vector inputs.
\(f(\mathbf{y})\) \(\frac{df(\mathbf{y})}{d\mathbf{y}}\)
\(\mathbf{y}^T \mathbf{A}\) \(\mathbf{A}\)
\(\mathbf{y}^T \mathbf{y}\) \(2\mathbf{y}\)
\(\mathbf{y}^T \mathbf{A} \mathbf{y}\) \(2\mathbf{A}\mathbf{y}\)

Comparing Tables A.1 and A.2, we can make parallels with the derivative results from the two contexts.

A.6 Additional topics

A.6.1 Determinant

The determinant of a matrix is a special function that is applied to a square matrix and returns a scalar value. The determinant of a matrix \(\mathbf{A}\) is usually denoted \(|\mathbf{A}|\) or \(\mathrm{det}(\mathbf{A})\). We do not discuss how to compute the determinant of a matrix, which is not needed for our purposes. You can find out more about matrix determinants at https://en.wikipedia.org/wiki/Determinant.

A.6.2 Linearly independent vectors

Let \(\mathbf{x}_1, \mathbf{x}_2, \ldots, \mathbf{x}_n\) be a set of \(n\) vectors of size \(p\times 1\).

Then \(\mathbf{x}_1, \mathbf{x}_2, \ldots, \mathbf{x}_n\) are linearly dependent if there exists \(\mathbf{a}=[a_1, a_2, \ldots, a_n]\neq \boldsymbol{0}_{n\times 1}\) such \[ a_1 \mathbf{x}_1 + a_2 \mathbf{x}_2 + \cdots + a_p \mathbf{x}_p = \boldsymbol{0}_{p\times 1}, \] where \(\boldsymbol{0}_{p\times 1}\) defines an \(p\times 1\) column vector of zeros.

Let \(\mathbf{X}\) be an \(n\times p\) matrix such that \(\mathbf{x}_1^T, \mathbf{x}_2^T, \ldots, \mathbf{x}_n^T\) make up its \(n\) rows and vectors \(\mathbf{X}_{[1]}, \mathbf{X}_{[2]}, \ldots, \mathbf{X}_{[p]}\) make up its columns, so that \[ \mathbf{X}= \begin{bmatrix} \mathbf{x}_1^T\\ \mathbf{x}_2^T\\ \vdots\\ \mathbf{x}_n^T \end{bmatrix}= \begin{bmatrix} \mathbf{X}_{[1]} & \mathbf{X}_{[1]} & \cdots & \mathbf{X}_{[p]} \end{bmatrix}. \]

The columns vectors of \(\mathbf{X}\) are linearly independent if there is no \(\mathbf{a}=[a_1,a_2,\ldots,a_p]\neq \boldsymbol{0}_{p\times 1}\) such that \[ a_1 \mathbf{X}_{[1]} + a_2 \mathbf{X}_{[2]} + \cdots + a_p \mathbf{X}_{[p]} = \boldsymbol{0}_{n\times 1}. \]

The rows of \(\mathbf{X}\) are linearly independent if there is no \(\mathbf{a}=[a_1,a_2,\ldots,a_n]\neq \boldsymbol{0}_{n\times 1}\) such that \[ a_1 \mathbf{x}_{1} + a_2 \mathbf{x}_{2} + \cdots + a_n \mathbf{x}_{n} = \boldsymbol{0}_{p\times 1}. \]

You can learn more about linear independence at https://en.wikipedia.org/wiki/Linear_independence.

A.6.3 Rank

The rank of a matrix is the number of linearly independent columns of the matrix.

If \(\mathbf{X}\) is an \(n\times p\) matrix with linearly dependent columns but removing a single column results in a linearly independent matrix, then the rank of \(\mathbf{X}\) is \(p-1\).

An \(n\times p\) matrix has full rank if its rank equals \(\min(n, p)\), i.e., the smaller of its number of rows and columns.