A Crash Course in MXNet Tensor Basics & Simple Automatic Differentiation
This is an overview of some basic functionality of the MXNet ndarray package for creating tensor-like objects, and using the autograd package for performing automatic differentiation.
I originally intended to play around with MXNet long ago, around the time that Gluon was released publicly. Things got busy. I got sidetracked.
I finally started using MXNet recently. In the interests of getting to know my way around, I thought covering some basics, such as how tensors and derivatives are handled, might be a good place to start (such as I did here and here with PyTorch).
This won't repeat what is in those previous PyTorch articles step by step, so look at those if you want any further context. What's below should be relatively straightforward, however.
MXNet is an open source neural network framework, a "flexible and efficient library for deep learning." Gluon is the imperative high-level API for MXNet, which provides additional flexibility and ease of use. You can think of the relationship between MXNet and Gluon as being similar to TensorFlow and Keras. We won't cover Gluon any further herein, but will explore it in future posts.
MXNet's tensor implementation comes in the form of the
ndarray package. Here you will find what's needed to build multidimensional (n-dimensional) arrays and perform some of the operations on them required for implementing neural networks, along with the
autograd package. It is this package we will make use of below.
ndarray (Very) Basics
First, let's import what we need from the library, in such a way as to simplify making our API calls:
import mxnet as mx from mxnet import autograd as ag from mxnet import nd
Now, let's create a basic
ndarray (on the CPU):
# Create CPU array a = nd.ones((3, 2)) print(a)
[[1. 1.] [1. 1.] [1. 1.]] <NDArray 3x2 @cpu(0)>
Note that printing an
ndarray also prints out the type of the object (again,
NDArray), as well as its size and the device to which it is attached (in this case, CPU).
What if we wanted to create an ndarray object with a GPU context (note that a context is the device type and ID which should be used to perform operations on the object)? First, let's determine whether or not there is a GPU available to MXNet:
# Test if GPU is recognized def gpu_device(gpu_number=0): try: _ = mx.nd.array([1, 2, 3], ctx=mx.gpu(gpu_number)) except mx.MXNetError: return None return mx.gpu(gpu_number) gpu_device()
This response denotes that there is a GPU device, and its ID is 0.
Let's create an
ndarray on this device:
# Create GPU array b = nd.zeros((2, 2), ctx=mx.gpu(0)) print(b)
[[0. 0.] [0. 0.]] <NDArray 2 x 2 @gpu(0)>
The output here confirms that an
ndarray of zeros of size 2 x 2 was created with a context of GPU.
To get a returned transposed
ndarray (as opposed to simply a transpose view of the original):
# Transpose T = c.T print(T)
[[1. 2. 3.] [4. 5. 6.]] <NDArray 2x3 @cpu(0)>
ndarray as a view, without alteration of the original data:
# Reshape r = T.reshape(3,2) print(r)
[[1. 2.] [3. 4.] [5. 6.]] <NDArray 3x2 @cpu(0)>
Some ndarray info:
# ndarray info print('ndarray shape:', r.shape) print('Number of dimensions:', r.ndim) print('ndarray type:', r.dtype)
ndarray shape: (3, 2) Number of dimensions: 2 ndarray type: <class 'numpy.float32'>
See here for more on
ndarray To and From Numpy
It's easy to go from Numpy
ndarrays to MXNet
ndarrays and vice versa.
import numpy as np # To numpy ndarray n = c.asnumpy() print(n) print(type(n))
[[1. 4.] [2. 5.] [3. 6.]] <class 'numpy.ndarray'>
# From numpy ndarray a = np.array([[1, 10], [2, 20], [3, 30]]) b = nd.array(a) print(b) print(type(b))
[[ 1. 10.] [ 2. 20.] [ 3. 30.]]
Here's how to compute a matrix-matrix dot product:
# Compute dot product t1 = nd.random.normal(-1, 1, shape=(3, 2)) t2 = nd.random.normal(-1, 1, shape=(2, 3)) t3 = nd.dot(t1, t2) print(t3)
[[1.8671514 2.0258508 1.1915313] [9.009048 8.481084 6.7323728] [5.0241795 4.346245 4.0459785]] <NDArray 3x3 @cpu(0)>
See here for more on linear algebra operations with
Using autograd to Find and Solve a Derivative
On to solving a derivative with the MXNet
autograd package for automatic differentiation.
First we will need a function for which to find the derivative. Arbitrarily, let's use this:
To see us work out the first order derivative of this function by hand, as well as find the value of our derivative function for a given value of x, see this post.
For reasons which should be obvious, we have to represent our function in Python as such:
y = 5*x**4 + 3*x**3 + 7*x**2 + 9*x - 5
Now let's find the value of our derivative function for a given value of x. Let's arbitrarily use 2:
x = nd.array() x.attach_grad() with ag.record(): y = 5*x**4 + 3*x**3 + 7*x**2 + 9*x - 5 y.backward() x.grad
Line by line, the above code:
- defines the value (2) we want to compute the derivative with regard to as an MXNet ndarray object
attach_grad()to allocate space for the gradient to be computed
- the code block denoted with
ag.record()contains the computation to be performed with regard to computing and tracking the gradient
- defines the function we want to compute the derivative of
- uses autograd's backward() to compute the sum of gradients, using the chain rule
- outputs the value stored in the x
ndarray's grad attribute, which, as shown below
This value, 233, matches what we calculated by hand in this post.
See here for more on automatic differentiation with
This has been a very basic overview of simple
ndarray operations and derivatives in MXNet. As these are 2 of the staples of building neural networks, this should provide some familiarity with the library's approaches to these basic buildings blocks, and allow for diving in to some more complex code. Next time we will create some simple neural networks with MXNet and Gluon, exploring the libraries more in-depth.
For more (right now!) on MXNet, Gluon, and deep learning in general, the freely-available book Deep Learning - The Straight Dope, written by those intimately involved in the development and evangelizing of these libraries, is definitely worth looking at.