Essential Math for Data Science: Linear Transformation with Matrices
You’ll start seeing matrices, not only as operations on numbers, but also as a way to transform vector spaces. This conception will give you the foundations needed to understand more complex linear algebra concepts like matrix decomposition.
By Hadrien Jean, Machine Learning Scientist
As you can see in Essential Math for Data Science, being able to manipulate vectors and matrices is critical to create machine learning and deep learning pipelines, for instance for reshaping your raw data before using it with machine learning libraries.
The goal of this chapter is to get you to the next level of understanding of vectors and matrices. You’ll start seeing matrices, not only as operations on numbers, but also as a way to transform vector spaces. This conception will give you the foundations needed to understand more complex linear algebra concepts like matrix decomposition. You’ll build up on what you learned about vector addition and scalar multiplication to understand linear combinations of vectors.
A linear transformation (or simply transformation, sometimes called linear map) is a mapping between two vector spaces: it takes a vector as input and transforms it into a new output vector. A function is said to be linear if the properties of additivity and scalar multiplication are preserved, that is, the same result is obtained if these operations are done before or after the transformation. Linear functions are synonymously called linear transformations.
You can encounter the following notation to describe a linear transformation: T(v). This refers to the vector v transformed by T. A transformation T is associated with a specific matrix. Since additivity and scalar multiplication must be preserved in linear transformation, you can write:
Linear Transformations as Vectors and Matrices
In linear algebra, the information concerning a linear transformation can be represented as a matrix. Moreover, every linear transformation can be expressed as a matrix.
When you do the linear transformation associated with a matrix, we say that you apply the matrix to the vector. More concretely, it means that you calculate the matrix-vector product of the matrix and the vector. In this case, the matrix can sometimes be called a transformation matrix. For instance, you can apply a matrix A to a vector v with their product Av.
When you multiply multiple matrices, the corresponding linear transformations are combined in the order from right to left.
For instance, let's say that a matrix A does a 45-degree clockwise rotation and a matrix B does a stretching, the product BA means that you first do the rotation and then the stretching.
This shows that the matrix product is:
- Not commutative (AB ≠ BA): the stretching then the rotation is a different transformation than the rotation then the stretching.
- Associative (A(BC)) = ((AB)C): the same transformations associated with the matrices A, B and C are done in the same order.
A matrix-vector product can thus be considered as a way to transform a vector. You can see in Essential Math for Data Science that the shape of A and v must match for the product to be possible.
A good way to understand the relationship between matrices and linear transformations is to actually visualize these transformations. To do that, you’ll use a grid of points in a two-dimensional space, each point corresponding to a vector (it is easier to visualize points instead of arrows pointing from the origin).
Let’s start by creating the grid using the function
meshgrid() from Numpy:
x = np.arange(-10, 10, 1) y = np.arange(-10, 10, 1) xx, yy = np.meshgrid(x, y)
meshgrid() function allows you to create all combinations of points from the arrays
y. Let’s plot the scatter plot corresponding to
plt.scatter(xx, yy, s=20, c=xx+yy) # [...] Add axis, x and y with the same scale
Figure 1: Each point corresponds to the combination of x and y values.
You can see the grid in Figure 1. The color corresponds to the addition of
yy values. This will make transformations easier to visualize.
The Linear Transformation associated with a Matrix
As a first example, let’s visualize the transformation associated with the following two-dimensional square matrix.
Consider that each point of the grid is a vector defined by two coordinates (x and y).
Let’s create the transformation matrix T:
T = np.array([ [-1, 0], [0, -1] ])
First, you need to structure the points of the grid to be able to apply the matrix to each of them. For now, you have two 20 by 20 matrices (
yy) corresponding to points, each having a x value (matrix
xx) and a y value (
yy). Let’s create a 2 by 400 matrix with
xx flatten as the first column and
yy as the second column.
xy = np.vstack([xx.flatten(), yy.flatten()]) xy.shape
You have now 400 points, each with two coordinates. Let’s apply the transformation matrix T to the first two-dimensional point (
xy[:, 0]), for instance:
T @ xy[:, 0]
You can similarly apply TT to each point by calculating its product with the matrix containing all points:
trans = T @ xy trans.shape
You can see that the shape is still (2,400). Each transformed vector (that is, each point of the grid) is one of the column of this new matrix. Now, let’s reshape this array to have two arrays with a similar shape to
xx_transformed = trans.reshape(xx.shape) yy_transformed = trans.reshape(yy.shape)
Let’s plot the grid before and after the transformation:
f, axes = plt.subplots(1, 2, figsize=(6, 3)) axes.scatter(xx, yy, s=10, c=xx+yy) axes.scatter(xx_transformed, yy_transformed, s=10, c=xx+yy) # [...] Add axis, x and y witht the same scale
Figure 2: The grid of points before (left) and after (right) its transformation by the matrix TT.
Figure 2 shows that the matrix T rotated the points of the grid.
Shapes of the Input and Output Vectors
In the previous example, the output vectors have the same number of dimensions than the input vectors (two dimensions).
You might notice that the shape of the transformation matrix must match the shape of the vectors you want to transform.
Figure 3: Shape of the transformation of the grid points by TT.
Figure 3 illustrates the shapes of this example. The first matrix with a shape (2, 2) is the transformation matrix T and the second matrix with a shape (2, 400) corresponds to the 400 vectors stacked. As illustrated in blue, the number of rows of the T corresponds to the number of dimensions of the output vectors. As illustrated in red, the transformation matrix must have the same number of columns as the number of dimensions of the matrix you want to transform.
More generally, the size of the transformation matrix tells you the input and output dimensions. An m by n transformation matrix transforms n-dimensional vectors to mm-dimensional vectors.
Stretching and Rotation
Let’s now visualize the transformation associated with the following matrix:
Let’s proceed as in the previous example:
T = np.array([ [1.3, -2.4], [0.1, 2] ]) trans = T @ xy xx_transformed = trans.reshape(xx.shape) yy_transformed = trans.reshape(yy.shape) f, axes = plt.subplots(1, 2, figsize=(6, 3)) axes.scatter(xx, yy, s=10, c=xx+yy) axes.scatter(xx_transformed, yy_transformed, s=10, c=xx+yy) # [...] Add axis, x and y witht the same scale
Figure 4: The grid of points before (left) and after (right) the transformation by the new matrix T.
Figure 4 shows that the transformation is different from the previous rotation. This time, there is a rotation, but also a stretching of the space.
Geometrically, there is linearity if the vectors lying on the same line in the input space are also on the same line in the output space, and if the origin remains at the same location.
Transforming the space with a matrix can be reversed if the matrix is invertible. In this case, the inverse T−1 of the matrix T is associated with a transformation that takes back the space to the initial state after T has been applied.
Let’s take again the example of the transformation associated with the following matrix:
You’ll plot the initial grid of point, the grid after being transformed by T, and the grid after successive application of T and T−1 (remember that matrices must be left-multiplied):
T = np.array([ [1.3, -2.4], [0.1, 2] ]) trans = T @ xy T_inv = np.linalg.inv(T) un_trans = T_inv @ T @ xy f, axes = plt.subplots(1, 3, figsize=(9, 3)) axes.scatter(xx, yy, s=10, c=xx+yy) axes.scatter(trans.reshape(xx.shape), trans.reshape(yy.shape), s=10, c=xx+yy) axes.scatter(un_trans.reshape(xx.shape), un_trans.reshape(yy.shape), s=10, c=xx+yy) # [...] Add axis, x and y witht the same scale
Figure 5: Inverse of a transformation: the initial space (left) is transformed with the matrix T (middle) and transformed back using T−1 (right).
As you can see in Figure 5, the inverse T−1 of the matrix T is associated with a transformation that reverses the one associated with T.
Mathematically, the transformation of a vector v by T is defined as:
To transform it back, you multiply by the inverse of T:
As you can see in Essential Math for Data Science, , so you have:
meaning that you get back the initial vector v.
Non Invertible Matrices
The linear transformation associated with a singular matrix (that is a non invertible matrix) can’t be reversed. It can occur when there is a loss of information with the transformation. Take the following matrix:
Let’s see how it transforms the space:
T = np.array([ [3, 6], [2, 4], ]) trans = T @ xy f, axes = plt.subplots(1, 2, figsize=(6, 3)) axes.scatter(xx, yy, s=10, c=xx+yy) axes.scatter(trans.reshape(xx.shape), trans.reshape(yy.shape), s=10, c=xx+yy) # [...] Add axis, x and y witht the same scale
Figure 6: The initial space (left) is transformed into a line (right) with the matrix T. Multiple input vectors land on the same location in the output space.
You can see in Figure 6 that the transformed vectors are on a line. There are points that land on the same place after the transformation. Thus, it is not possible to go back. In this case, the matrix T is not invertible: it is singular.
Bio: Hadrien Jean is a machine learning scientist. He owns a Ph.D in cognitive science from the Ecole Normale Superieure, Paris, where he did research on auditory perception using behavioral and electrophysiological data. He previously worked in industry where he built deep learning pipelines for speech processing. At the corner of data science and environment, he works on projects about biodiversity assessement using deep learning applied to audio recordings. He also periodically creates content and teaches at Le Wagon (data science Bootcamp), and writes articles in his blog (hadrienj.github.io).
Original. Reposted with permission.
- Essential Math for Data Science: Probability Density and Probability Mass Functions
- Essential Math for Data Science: Integrals And Area Under The Curve
- Essential Math for Data Science: The Poisson Distribution