KDnuggets Home » News » 2016 » Jun » Tutorials, Overviews » An Introduction to Scientific Python (and a Bit of the Maths Behind It) – NumPy ( 16:n20 )

An Introduction to Scientific Python (and a Bit of the Maths Behind It) – NumPy


An introductory overview of NumPy, one of the foundational aspects of Scientific Computing in Python, along with some explanation of the maths involved.



By Jamal Moir, Oxford Brookes University.

NumPy

Oh the amazing things you can do with Numpy.

NumPy is a blazing fast maths library for Python with a heavy emphasis on arrays. It allows you to do vector and matrix maths within Python and as a lot of the underlying functions are actually written in C, you get speeds that you would never reach in vanilla Python.

Numpy is an absolutely key piece to the success of scientific Python and if you want to get into Data Science and or Machine Learning in Python, it's a must learn. NumPy is well built in my opinion and getting started with it is not difficult at all.

This is the second post in a series of posts on scientific Python, don't forget to check out the others too. An up-to-date list of posts in this series is at the bottom of this post.

Array Basics

 
Creation

NumPy revolves around these things called arrays. Actually nparrays, but we don't need to worry about that. With these arrays we can do all sorts of useful things like vector and matrix maths at lightning speeds. Get your linear algebra on! (Just kidding we won't be doing any heavy maths)

# 1D Array
a = np.array([0, 1, 2, 3, 4])
b = np.array((0, 1, 2, 3, 4))
c = np.arange(5)
d = np.linspace(0, 2*np.pi, 5)

print(a) # >>>[0 1 2 3 4]
print(b) # >>>[0 1 2 3 4]
print(c) # >>>[0 1 2 3 4]
print(d) # >>>[ 0.          1.57079633  3.14159265  4.71238898  6.28318531]
print(a[3]) # >>>3


The above code shows 4 different ways of creating an array. The most basic way is just passing a sequence to NumPy's array() function; you can pass it any sequence, not just lists like you usually see.

Notice how when we print an array with numbers of different length, it automatically pads them out. This is useful for viewing matrices. Indexing on arrays works just like that of a list or any other of Python's sequences. You can also use slicing on them, I won't go into slicing a 1D array here, if you want more information on slicing, check out this post.

The above array example is how you can represent a vector with NumPy, next we will take a look at how we can represent matrices and more with multidimensional arrays.

# MD Array,
a = np.array([[11, 12, 13, 14, 15],
              [16, 17, 18, 19, 20],
              [21, 22, 23, 24, 25],
              [26, 27, 28 ,29, 30],
              [31, 32, 33, 34, 35]])

print(a[2,4]) # >>>25


To create a 2D array we pass the array() function a list of lists (or a sequence of sequences). If we wanted a 3D array we would pass it a list of lists of lists, a 4D array would be a list of lists of lists of lists and so on.

Notice how with a 2D array (with the help of our friend the space bar), is arranged in rows and columns. To index a 2D array we simply reference a row and a column.

A Bit of the Maths Behind It

To understand this properly, we should really take a look at what vectors and matrices are.

vector is a quantity that has both direction and magnitude. They are often used to represent things such as velocity, acceleration and momentum. Vectors can be written in a number of ways although the one which will be most useful to us is the form where they are written as an n-tuple such as (1, 4, 6, 9). This is how we represent them in NumPy.

matrix is similar to a vector, except it is made up of rows and columns; much like a grid. The values within the matrix can be referenced by giving the row and the column that it resides in. In NumPy we make arrays by passing a sequence of sequences as we did previously.

NumPy Maths

Multidimensional Array Slicing

Slicing a multidimensional array is a bit more complicated than a 1D one and it's something that you will do a lot while using NumPy.

# MD slicing
print(a[0, 1:4]) # >>>[12 13 14]
print(a[1:4, 0]) # >>>[16 21 26]
print(a[::2,::2]) # >>>[[11 13 15]
                  #     [21 23 25]
                  #     [31 33 35]]
print(a[:, 1]) # >>>[12 17 22 27 32]


As you can see you slice a multidimensional array by doing a separate slice for each dimension separated with commas. So with a 2D array our first slice defines the slicing for rows and our second slice defines the slicing for columns.

Notice that you can simply specify a row or a column by entering the number. The first example above selects the 0th column from the array.

The diagram below illustrates what the given example slices do.

NumPy Slicing

Array Properties

When working with NumPy you might want to know certain things about your arrays. Luckily there are lots of handy methods included within the package to give you the information that you need.

# Array properties
a = np.array([[11, 12, 13, 14, 15],
              [16, 17, 18, 19, 20],
              [21, 22, 23, 24, 25],
              [26, 27, 28 ,29, 30],
              [31, 32, 33, 34, 35]])

print(type(a)) # >>><class 'numpy.ndarray'>
print(a.dtype) # >>>int64
print(a.size) # >>>25
print(a.shape) # >>>(5, 5)
print(a.itemsize) # >>>8
print(a.ndim) # >>>2
print(a.nbytes) # >>>200


As you can see in the above code a NumPy array is actually called an ndarray. I don't know why it's called an ndarray, if anyone knows please leave a comment! My guess is that it stands for n dimensional array.

The shape of an array is how many rows and columns it has, the above array has 5 rows and 5 columns so its shape is (5, 5).

The 'itemsize' property is how many bytes each item takes up. The data type of this array is int64, there are 64 bits in an int64, 8 bits in a byte, divide 64 by 8 and you get how many bytes it takes up, which in this case is 8.

The 'ndim' property is how many dimensions the array has. This one has 2. A vector for example however, has just 1.

The 'nbytes' property is how many bytes are used up by all the data in the array. You should note that this does not count the overhead of an array and so the actual space that the array takes up will be a little bit larger.