An Introduction to Scientific Python (and a Bit of the Maths Behind It) – NumPy
An introductory overview of NumPy, one of the foundational aspects of Scientific Computing in Python, along with some explanation of the maths involved.
Working With Arrays
Just being able to make and retrieve elements and properties from an array isn't going to get you very far, you will need to do maths on them sometimes too. You can do this using the basic operators such as +, -, /, etc.
# Basic Operators a = np.arange(25) a = a.reshape((5, 5)) b = np.array([10, 62, 1, 14, 2, 56, 79, 2, 1, 45, 4, 92, 5, 55, 63, 43, 35, 6, 53, 24, 56, 3, 56, 44, 78]) b = b.reshape((5,5)) print(a + b) print(a - b) print(a * b) print(a / b) print(a ** 2) print(a < b) print(a > b) print(a.dot(b))
With the exception of dot() all of these operators work element-wise on the array. For example (a, b, c) + (d, e, f) would be (a+d, b+e, c+f). It will work separately on each element, pairing the corresponding elements up and doing arithmetic on them. It will then return an array of the results. Note that when using logical operators such as < and > an array of booleans will be returned, which has a very useful application which we will go through later.
The dot() function works out the dot product of two arrays. This does not return an array, but a scalar (a value with just magnitude and no direction).
A Bit of the Maths Behind It
The dot() function is something called the dot product. The best way to understand this is to see how it is calculated.
Array Specific Operators
There are also some useful operators provided by NumPy for processing an array.
# dot, sum, min, max, cumsum a = np.arange(10) print(a.sum()) # >>>45 print(a.min()) # >>>0 print(a.max()) # >>>9 print(a.cumsum()) # >>>[ 0 1 3 6 10 15 21 28 36 45]
The sum(), min() and max() functions are pretty obvious in what they do. Add up all the elements and find the minimum and maximum elements.
The cumsum() function however is a little less obvious. It adds together every element like sum() but it does this by first adding up the first and the second and storing the result of that calculation in a list and adding that result to the third, which again is then stored in a list. This is done for all elements in the array, returning a running total of the sum of the array as a list.
'Fancy indexing' is a useful way of picking out specific array elements that you want to work with.
# Fancy indexing a = np.arange(0, 100, 10) indices = [1, 5, -1] b = a[indices] print(a) # >>>[ 0 10 20 30 40 50 60 70 80 90] print(b) # >>>[10 50 90]
As you can see in the above example we index the array with a sequence of the specific indexes that we want to retrieve. This in turn returns a list of the the elements we indexed.
Boolean masking is a fantastic feature that allows us to retrieve elements in an array based on a condition that we specify.
# Boolean masking import matplotlib.pyplot as plt a = np.linspace(0, 2 * np.pi, 50) b = np.sin(a) plt.plot(a,b) mask = b >= 0 plt.plot(a[mask], b[mask], 'bo') mask = (b >= 0) & (a <= np.pi / 2) plt.plot(a[mask], b[mask], 'go') plt.show()
The above example shows how to do boolean masking. All you have to do is pass the array a conditional involving the array and it will give you an array of the values that return true for that condition.
The example produces the following plot:
We use the conditions to select different points on the plot. The blue points (which in the diagram also include the green points, but the green points cover up the blue ones), show all the points that have a value greater than 0. The green points show all points that have a value greater than 0 and that are less than half pi.
Incomplete indexing is a convenient way of taking an index or slice from the first dimension of a multidimensional array. For example if you had the array a = [[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]], then a would give you the element with index 3 in the first dimension of the array, which here would be the value 4.
# Incomplete Indexing a = np.arange(0, 100, 10) b = a[:5] c = a[a >= 50] print(b) # >>>[ 0 10 20 30 40] print(c) # >>>[50 60 70 80 90]
the where() function is another useful way of retrieving elements of an array conditionally. Simply pass it a condition and it will return a list of elements where that condition is true.
# Where a = np.arange(0, 100, 10) b = np.where(a < 50) c = np.where(a >= 50) print(b) # >>>(array([0, 1, 2, 3, 4]),) print(c) # >>>[5 6 7 8 9]
And that's NumPy, not so hard right? Of course this post only covers the basics to get you going, there are many other things that you can do in NumPy that when you are comfortable with these basics, you should take a look at.
Share this post so that other people can read it too and don't forget to subscribe to this blog via email, follow me on Twitter and/or add me on Google+ to make sure you don't miss any posts that you will find useful. Also, feel free to leave a comment whether to ask a question, point out something I've missed or anything else.
Bio: Jamal Moir is a student of Computer Science and Japanese Studies, studying at Oxford Brookes University. He regularly blogs bout computer science exploration at jamalmoir.com.
Original. Reposted with permission.