Linear algebra cheat-sheet

“The matrix” — Dan LeFebvre

Ok, this one is about vectors and matrices, well, not the matrix from the picture. Not yet…

It is a cheat-sheet, not a course. It is intended to be a reference for who already know it or what to refresh his mind. It can be a good test. Do you remember these concepts ?

It starts from basic high school concepts and goes through master ones. I start by my notes from the DSTI classes and I will add new concepts over time. Feel free to propose things to be added in the comments.

Unless otherwise specified :

  • We are in ℝⁿ

Vectors

\textbf{x} = (x_1,x_2,\cdots ,x_n)
\textbf{x} = (x_1,x_2,\cdots ,x_n)

Tip: create a vector with code

# R
c(1, 2, 3)
# Python
import numpy as np
np.array([1,2,3])
# Julia
[1 2 3]

Linear vector space

Additivity:

\begin{pmatrix}
 x_1\\ 
 \vdots \\ 
 x_n
 \end{pmatrix}
 +
 \begin{pmatrix}
 y_1\\ 
 \vdots \\ 
 y_n
 \end{pmatrix}
 =
 \begin{pmatrix}
 x_1 + y_1\\ 
 \vdots \\ 
 x_n + y_n
 \end{pmatrix}
\begin{pmatrix}
 x_1\\ 
 \vdots \\ 
 x_n
 \end{pmatrix}
 +
 \begin{pmatrix}
 y_1\\ 
 \vdots \\ 
 y_n
 \end{pmatrix}
 =
 \begin{pmatrix}
 x_1 + y_1\\ 
 \vdots \\ 
 x_n + y_n
 \end{pmatrix}

Homogeneity of degree 1:

a \begin{pmatrix}
 x_1\\ 
 \vdots \\ 
 x_n
 \end{pmatrix}
 =
 \begin{pmatrix}
 a x_1\\ 
 \vdots \\ 
 a x_n
 \end{pmatrix}
a \begin{pmatrix}
 x_1\\ 
 \vdots \\ 
 x_n
 \end{pmatrix}
 =
 \begin{pmatrix}
 a x_1\\ 
 \vdots \\ 
 a x_n
 \end{pmatrix}

Dot product:

\mathbf{x} \cdot \mathbf{y} = \sum_{i=1}^{n} x_i \, y_i
\mathbf{x} \cdot \mathbf{y} = \sum_{i=1}^{n} x_i \, y_i

With θ the angle between the vectors: x.y = |x||y| cos(θ)

Properties of the dot product:

  • Additive: ( x + y ) ⋅ z = x ⋅ z + y ⋅ z

Tip: compute the dot product

# R
v %*% w
# Python
np.dot(v,w)
# Julia
v * w

Orthogonal vectors

\mathbf{x} \cdot \mathbf{y} = 0 \Leftrightarrow \mathbf{x} \perp \mathbf{y}
\mathbf{x} \cdot \mathbf{y} = 0 \Leftrightarrow \mathbf{x} \perp \mathbf{y}

Euclidean norm of a vector

This is the “length” of the vector. We can calculate it as the square root of its dot product by itself. A norm is strictly positive when x ≠ 0 and zero else.

Tip: compute the norm

# R
norm(v)
# Python
np.linalg.norm(v)
# Julia
norm(v)

Unit vector

Same direction, norm equal 1:

\mathbf{\widehat{x}} = \frac{\mathbf{x}}{\left \| \mathbf{x} \right \|}
\mathbf{\widehat{x}} = \frac{\mathbf{x}}{\left \| \mathbf{x} \right \|}

Relations between the dot product and matrices

With Xᵀ the transpose of the one row matrix that represent the vector x. and Y the row matrix representing the vector y.

\mathbf{x} \cdot \mathbf{y} = X^T \, Y
\mathbf{x} \cdot \mathbf{y} = X^T \, Y

We can move a matrix at the other side:

(A\mathbf{x}) \cdot \mathbf{y} = \mathbf{x} \cdot (A^T\mathbf{y})
(A\mathbf{x}) \cdot \mathbf{y} = \mathbf{x} \cdot (A^T\mathbf{y})

Since, using: x . y = Xᵀ Y and (AB)ᵀ = Aᵀ Bᵀ:

(Ax) ⋅ y = (AX)ᵀ Y =(Xᵀ Aᵀ) Y =Xᵀ (Aᵀ Y) = x ⋅ (Aᵀ y)

Derivative of a dot product

if x(t) and y(t) are derivable functions of t:

\frac{\mathrm{d} }{\mathrm{d} t}(\mathbf{x} \cdot \mathbf{y}) = \mathbf{x} \cdot \frac{\mathrm{d} \mathbf{y}}{\mathrm{d} t} + \frac{\mathrm{d} \mathbf{x}}{\mathrm{d} t} \cdot \mathbf{y}
\frac{\mathrm{d} }{\mathrm{d} t}(\mathbf{x} \cdot \mathbf{y}) = \mathbf{x} \cdot \frac{\mathrm{d} \mathbf{y}}{\mathrm{d} t} + \frac{\mathrm{d} \mathbf{x}}{\mathrm{d} t} \cdot \mathbf{y}

Linear independence

Vectors x₁ … xₙ are linearly independent if the only scalars αᵢ that satisfy the above equation are 0.

α₁ x₁ + … + αₙ xₙ = 0 ⇒ α₁ = . . . = αₙ = 0

We will see later that an easy method to verify programatically that a set of vectors is linearly independent is to concat the vectors as a matrix and check if the determinant is equal to 0.

Matrix types

General form

Indexes n and m being integers:

Tip: the Latex formula to generate this figure is:

\begin{bmatrix}
m_{1,1} & m_{1,2} & \cdots & m_{1,n} \\
m_{2,1} & m_{2,2} & \cdots & m_{2,n} \\
\vdots & \vdots & \ddots & \vdots \\
m_{m,1} & m_{m,2} & \cdots & m_{m,n}
\end{bmatrix}

Tip: create a matrix with code

# R
matrix(c(1,2,3,4,5,6), 3, 2)
dim(c(1,2,3,4,5,6)) <- c(3,2)
# Python
import numpy as np
np.array([[1,4],[2,5],[3,6]])
# Julia
[1 2 3; 4 5 6] # by columns
[1 4; 2 5; 3 6] # by rows

Triangular matrix ◰⬔

The upper or the lower part after the diagonal is filled by zeros.

Diagonal matrix ◰⧅

The upper and the lower part after the diagonal are filled by zeros. Diagonal matrices tends to make calculus way easier.

Diagonal matrices have interesting properties:

Identity matrix ◰⧅

This is a diagonal matrix of ones.

Symmetric matrix ◰▧

The upper and the lower part aside the diagonal are identical.

Anti-symmetric matrix ◰

Same, with the signs inverted on each side.

Idempotent and periodic matrices ◰

A periodic matrix has an integer k so that:

If k = 1, the matrix is said to be idempotent:

More special matrices on MathWorld.

Matrices of derivatives

The gradient

The gradient (symbol “nabla”: ∇) is the vector of the first order derivatives of a multivariable function. The gradient is pretty useful, since it points in the direction of the highest increase.

Tip: compute the gradient on one defined vector/point v of a function f

# R
gradient(f, v)
# Python
import numdifftools as nd
nd.Gradient(f)(v)
# Julia
import ForwardDiff: gradient
gradient(f, v)

The Hessian matrix

The Hessian matrix (symbole H or ∇²) is the matrix of the second order derivatives of a multivariable function.

What can also be noted:

Tip: compute the Hessian matrix on one defined vector/point v of a function f

# R
hessian(f, v)
# Python
import numdifftools as nd
nd.Hessian(f)(v)
# Julia
import ForwardDiff: hessian
hessian(f, v)

The Jacobian matrix

If f is a vector of functions fⱼ(x₁,…,xₙ) with j in [1,…,m], The Jacobian matrix is the matrix of all 1st-order partial derivatives.

Tip: compute the Jacobian matrix on one defined matrix/point M of a set of multivariable functions F

# R
jacobian(f, v)
# Python
import numdifftools as nd
nd.Jacobian(f)(v)
# Julia
import ForwardDiff: jacobian
jacobian(F, M)

Matrix algebra

Addition

Matrix addition can be done simply by adding each values at the same i,j index. The matrix have to have the same shape. This is also valid for the difference.

What implies that the following properties are valid:

Multiplication by a scalar

In the same way, just multiply each value by the scalar λ.

Matrices multiplication

The matrix can be multiplied if the shapes can chain:

For AB to be possible we need shape A = l,m & shape B = m,n

The formula may sound a bit weird:

We actually make the dot product in each column and rows in regard.

A nice illustration from Wikimedia Commons:

The following properties are valid:

Beware, the multiplication is not commutative (it may not even be possible unless the matrices are square).

Tip: compute the dot product

# R
A %*% B
# Python
np.dot(A,B)
# Julia
A * B

Matrix transposition

Mᵀ is the transpose of M such as row become columns and columns become rows.

The following properties are valid:

Tip: compute the transpose

# R
t(A)
# Python
A.T
# Julia
A'

Property of the transpose and the dot product:

Trace of a matrix ◰

The trace is simply the sum of the values on the diagonal.

With the properties:

Tip: compute the trace

# R
sum(diag(A))
# Python
np.linalg.trace(A)
# Julia
trace(A)

Determinant of a matrix ◰

Matrix of order 2

\begin{align*}|A| = \begin{vmatrix} a_{11} & a_{12}\\a_{21} & a_{22} \end{vmatrix}=a_{11}a_{22}-a_{12}a_{21}.\end{align*}
\begin{align*}|A| = \begin{vmatrix} a_{11} & a_{12}\\a_{21} & a_{22} \end{vmatrix}=a_{11}a_{22}-a_{12}a_{21}.\end{align*}

Matrix of order 3

We take a row or column (here the first row) and compute the determinants deleting the row and column of each value, then multiply by the value. We apply the sign of the matrix of signs.

Here is the 3x3 matrix of signs.

Matrix of order n

At order n, we can use the same method, by calculating the cofactor deleting a row & column in the same way. The result of each operation is called a minor determinant Mᵢⱼ. When the sign is added, it is called a cofactor Cᵢⱼ. The process has to be repeated until we reach rank 2 Mᵢⱼ determinants that can be calculated. Since the method is extremely costly at high orders, numerical methods are usually prefered.

C_{i,j} = (-1)^{i+j}M_{i,j}
 \\
 \\det(A) = \sum_{j=1}^{n} a_{ij}C_{ij}
C_{i,j} = (-1)^{i+j}M_{i,j}
 \\
 \\det(A) = \sum_{j=1}^{n} a_{ij}C_{ij}

Graphical interpretation of the determinant

For a 2x2 matrix, the determinant is the change in the area of the square made by the 2 basis’ vectors, when the transformation is applied. Similarly, for 3x3 matrices, it the change in the volume of the initial cube drawn by the 3 basis’s vectors.

Special values of the determinant

  • When det A = 0, we have a singular matrix. Its rank is < n. It has no inverse. It represents a linear system when there is an infinity of solutions.

Properties of determinants

Calculus rules

Beware, the determinant is not linear.

  • Distributive: det AB = det A . det B

Determinants after operations

  • det Aᵀ = det A

Determinants of special matrices

  • det I = 1

Tip: code

# R
det(A)
# Python
np.linalg.det(A)
# Julia
det(A)

Inverse of a Matrix ◰

2x2 matrix

3x3 matrix

C is the matrix of cofactors. It’s transpose is also called the adjoint matrix. For more details.

Properties of inverses

Solving a system of equations with A⁻¹

A system of equation can be written in a matrix form:

In short:

Then, we can solve it using the inverse matrix:

This is more costly than solving the system once, so this is mostly useful when we need to solve the same system several time.

Tip: code

# R
library(matlib)
inv(A)
# Python
numpy.linalg.inv(A)
# Julia
inv(A)

Matrix diagonalisation ▧ ⇨ ⧅

Diagonal matrices are way easier to work with, since many operations are made possible if the matrix is diagonal:

Eigenvalues and eigenvectors are used for many purposes, including matrix diagonalisation. There, eigenvectors form an orthonormal basis that is the new basis after the transformation.

General process

Eigenvectors are vectors q so that there is eigenvalues λ that are solutions of:

Since this is equivalent to (A-λI)q=0 and q is not zero, the determinant of A-λI is 0. We can determine possible values of eigenvalues λ by solving the characteristic equation:

Then we come back to the original equation and we solve the system for each possible value of λ and take non-zero solutions. This gives us the eigenvectors. Note that there is an infinity of solutions. The norm and sign of each qᵢ could be chosen freely. Another interesting property is that eigenvectors are normal to each other. So, now that we know the eigenvalues, lets solve:

By the end, we can normalize the eigenvectors q with p = q/|q|, to create an orthonormal basis and put them together in a matrix P as columns:

P = 
 \begin{bmatrix}
 \begin{bmatrix} \\ \widehat{\textbf{p}}_{1} \\ \\ \end{bmatrix} & … & \begin{bmatrix} \\ \widehat{\textbf{p}}_{n} \\ \\ \end{bmatrix}\\ 
 \end{bmatrix}
P = 
 \begin{bmatrix}
 \begin{bmatrix} \\ \widehat{\textbf{p}}_{1} \\ \\ \end{bmatrix} & … & \begin{bmatrix} \\ \widehat{\textbf{p}}_{n} \\ \\ \end{bmatrix}\\ 
 \end{bmatrix}

Since P is orthogonal, we have:

Also, we can put all the eigenvalues in a diagonal matrix:

\Lambda = \begin{bmatrix}
 \lambda_{1} & \cdots & 0 \\ 
 \vdots & \ddots & \vdots \\ 
 0 & \cdots & \lambda_{n}
 \end{bmatrix}
\Lambda = \begin{bmatrix}
 \lambda_{1} & \cdots & 0 \\ 
 \vdots & \ddots & \vdots \\ 
 0 & \cdots & \lambda_{n}
 \end{bmatrix}

The matrix P can be used to change the coordinates and get into a basis were the original matrix A is equivalent to the diagonal matrix Λ :

The matrix P apply a change of coordinates to transform A into a diagonal matrix. This is actually a simple rotation.

We can use this to solve linear systems. We can define a new variable Λy that will work in this new basis.

\\ A\textbf{x} = P\Lambda P^{-1}\textbf{x}
 \\ 
 \\ \textbf{y} = P^{-1}\textbf{x}
 \\
 \\ A\textbf{x} = P\Lambda \textbf{y}
\\ A\textbf{x} = P\Lambda P^{-1}\textbf{x}
 \\ 
 \\ \textbf{y} = P^{-1}\textbf{x}
 \\
 \\ A\textbf{x} = P\Lambda \textbf{y}

It also appear as a kind of principal components analysis. We rotate the system and then we can keep only the high eigenvalues. The others being considered as near-zero, so basically representing noise.

Calculus for order 2

What can be written:

This solves as:

\lambda_{1,2} = \frac{\operatorname{tr}(A) \pm \sqrt{\operatorname{tr}(A)^{2}-4\operatorname{det}(A)}}{2}
\lambda_{1,2} = \frac{\operatorname{tr}(A) \pm \sqrt{\operatorname{tr}(A)^{2}-4\operatorname{det}(A)}}{2}

To ease the process in some “nice” situations, it is worth mentioning that:

We can now substitute the eigenvalues in the equation to find the eigenvectors.

What mean:

We repeat for q₂, normalize, and we have the P = [p₁ p₂] matrix.

Tip for order 3

At order 3 the zero of the characteristic equation has the form:

Properties of eigenvalues and eigenvectors

  • The vectors pᵢ of P are linearly independents.

Tip: computing eigenvalues & eigenvectors

# R
eigen(A)
# Python
import numpy as np
np.linalg.eig(A)
# Julia
eigvals(A)
eigvecs(A)

Quadratic forms

Quadratic form are degree 2 polynomial functions of n variables.

This can be written as a matrix multiplication, with A the matrix of values aᵢⱼ and x the vector of values xᵢ:

\sum_{i=1}^{n}\sum_{j=1}^{n}a_{ij}{x_i}{x_j} = \textbf{x}^{T}A\textbf{x}
\sum_{i=1}^{n}\sum_{j=1}^{n}a_{ij}{x_i}{x_j} = \textbf{x}^{T}A\textbf{x}

Since xᵢxⱼ = xⱼxᵢ, where i≠j, quadratic forms are usually noted ꞵᵢⱼxᵢxⱼ with ꞵᵢⱼ = (aᵢⱼ +aⱼᵢ). So that we don’t know the value of aᵢⱼ and aⱼᵢ when i ≠ j, to build the matrix. Actually, we can take what we want as long as the sum is ꞵᵢⱼ. For ease of calculus, we create this matrix so that it is symmetric, putting (aᵢⱼ +aⱼᵢ)/2 on each side of the symmetry axis.

\begin{bmatrix}
 a_{11} & \frac{\beta_{12}}{2} & \frac{\beta_{13}}{2}\\ 
 \frac{\beta_{12}}{2} & a_{22} & \frac{\beta_{23}}{2}\\ 
 \frac{\beta_{13}}{2} & \frac{\beta_{23}}{2} & a_{33}
 \end{bmatrix}
\begin{bmatrix}
 a_{11} & \frac{\beta_{12}}{2} & \frac{\beta_{13}}{2}\\ 
 \frac{\beta_{12}}{2} & a_{22} & \frac{\beta_{23}}{2}\\ 
 \frac{\beta_{13}}{2} & \frac{\beta_{23}}{2} & a_{33}
 \end{bmatrix}

This matrix can be diagonalised with the method of the above section.

Types of quadratic forms

If λᵢ are the eigenvalues λᵢ of A:

  • x ᵀAx > 0 ∀x≠0 ⇒ A positive definite ⇔ all λᵢ > 0

Nice ressources on the internet

Publications in French & English about design, innovation and programming.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store