Submit a blog to KDnuggets -- Top Blogs Win A Reward

Topics: AI | Data Science | Data Visualization | Deep Learning | Machine Learning | NLP | Python | R | Statistics

KDnuggets Home » News » 2020 » Dec » Tutorials, Overviews » Object-Oriented Programming Explained Simply for Data Scientists ( 20:n45 )

Gold BlogObject-Oriented Programming Explained Simply for Data Scientists


Read this simple but effective guide to start using Classes in Python 3.



Figure

 

Object-Oriented Programming or OOP can be a tough concept to understand for beginners. And that’s mainly because it is not really explained in the right way in a lot of places. Normally a lot of books start by explaining OOP by talking about the three big terms — Encapsulation, Inheritance and Polymorphism. But the time the book can explain these topics, anyone who is just starting would already feel lost.

So, I thought of making the concept a little easier for fellow programmers, Data Scientists and Pythonistas. The way I intend to do is by removing all the Jargon and going through some examples. I would start by explaining classes and objects. Then I would explain why classes are important in various situations and how they solve some fundamental problems. In this way, the reader would also be able to understand the three big terms by the end of the post.

In this series of posts named Python Shorts, I will explain some simple but very useful constructs provided by Python, some essential tips, and some use cases I come up with regularly in my Data Science work.

This post is about explaining OOP the laymen way.

 

What are Objects and Classes?

 
Put simply, everything in Python is an object and classes are a blueprint of objects. So when we write:

a = 2
b = "Hello!"


We are creating an object a of class int holding the value 2 and object b of class str holding the value “Hello!”. In a way, these two particular classes are provided to us by default when we use numbers or strings.

Apart from these a lot of us end up working with classes and objects without even realizing it. For example, you are actually using a class when you use any Scikit-Learn model.

clf = RandomForestClassifier()
clf.fit(X,y)


Here your classifier clf is an object and fit is a method defined in the class RandomForestClassifier

 

But Why Classes?

 
So, we use them a lot when we are working with Python. But why really. What is it with classes? I could do the same with functions?

Yes, you can. But classes really provide you with a lot of power compared to functions. To quote an example, the str class has a lot of functions defined for the object which we can access just by pressing tab. One could also write all these functions, but that way, they would not be available to use just by pressing the tab button.

Image for post

This property of classes is called encapsulation. From Wikipedia — encapsulation refers to the bundling of data with the methods that operate on that data, or the restricting of direct access to some of an object’s components.

So here the str class bundles the data(“Hello!”) with all the methods that would operate on our data. I would explain the second part of that statement by the end of the post. In the same way, the RandomForestClassifier class bundles all the classifier methods (fitpredict etc.)

Apart from this, Class usage can also help us to make the code much more modular and easy to maintain. So say we were to create a library like Scikit-Learn. We need to create many models, and each model will have a fit and predict method. If we don’t use classes, we will end up with a lot of functions for each of our different models like:

RFCFit
RFCPredict
SVCFit
SVCPredict
LRFit
LRPredict and so on.


This sort of a code structure is just a nightmare to work with, and hence Scikit-Learn defines each of the models as a class having the fit and predict methods.

 

Creating a Class

 
So, now we understand why to use classes and how they are so important, how do we really go about using them? So, creating a class is pretty simple. Below is a boilerplate code for any class you will end up writing:

class myClass:
    def __init__(self, a, b):
        self.a = a
        self.b = b    def somefunc(self, arg1, arg2):
        #SOME CODE HERE


We see a lot of new keywords here. The main ones are class,__init__ and self. So what are these? Again, it is easily explained by some example.

Suppose you are working at a bank that has many accounts. We can create a class named account that would be used to work with any account. For example, below I create an elementary toy class Account which stores data for a user — namely account_name and balance. It also provides us with two methods to deposit/withdraw money to/from the bank account. Do read through it. It follows the same structure as the code above.

class Account:
    def __init__(self, account_name, balance=0):
        self.account_name = account_name
        self.balance = balance
    
    def deposit(self, amount):
        self.balance += amount
    
    def withdraw(self,amount):
        if amount <= self.balance:
            self.balance -= amount
        else:
            print("Cannot Withdraw amounts as no funds!!!")


We can create an account with a name Rahul and having an amount of 100 using:

myAccount = Account("Rahul",100)


We can access the data for this account using:

Image for post

But, how are these attributes balance and account_name already set to 100, and “Rahul” respectively? We never did call the __init__ method, so why did the object gets these attribute? The answer here is that __init__ is a magic method(There are a lot of other magic methods which I would expand on in my next post on Magic Methods), which gets run whenever we create the object. So when we create myAccount , it automatically also runs the function __init__

So now we understand __init__, let us try to deposit some money into our account. We can do this by:

Image for post

And our balance rose to 200. But did you notice that our function deposit needed two arguments namely self and amount, yet we only provided one, and still, it works.
So, what is this self The way I like to explain self is by calling the same function in an albeit different way. Below, I call the same function deposit belonging to the class account and provide it with the myAccount object and the amount. And now the function takes two arguments as it should.

Image for post

And our myAccount balance increases by 100 as expected. So it is the same function we have called. Now, that could only happen if self and myAccount are exactly the same object. When I call myAccount.deposit(100) Python provides the same object myAccount to the function call as the argument self. And that is why self.balance in the function definition really refers to myAccount.balance.

 

But, still, some problems remain

 

 

We know how to create classes, but still, there is another important problem that I haven’t touched upon yet.

So, suppose you are working with Apple iPhone Division, and you have to create a different Class for each iPhone model. For this simple example, let us say that our iPhone’s first version currently does a single thing only — Makes a call and has some memory. We can write the class as:

class iPhone:
    def __init__(self, memory, user_id):
         self.memory = memory
         self.mobile_id = user_id
    def call(self, contactNum):
         # Some Implementation Here


Now, Apple plans to launch iPhone1 and this iPhone Model introduces a new functionality — The ability to take a pic. One way to do this is to copy-paste the above code and create a new class iPhone1 like:

class iPhone1:
    def __init__(self, memory, user_id):
         self.memory = memory
         self.mobile_id = user_id
         self.pics = []    def call(self, contactNum):
         # Some Implementation Here    def click_pic(self):
         # Some Implementation Here
         pic_taken = ...
         self.pics.append(pic_taken)


But as you can see that is a lot of unnecessary duplication of code (shown in bold above) and Python has a solution for removing that code duplication. One good way to write our iPhone1 class is:

Class iPhone1(iPhone):
    def __init__(self,memory,user_id):
         super().__init__(memory,user_id)
         self.pics = []
    def click_pic(self):
         # Some Implementation Here
         pic_taken = ...
         self.pics.append(pic_taken)


And that is the concept of inheritance. As per WikipediaInheritance is the mechanism of basing an object or class upon another object or class retaining similar implementation. Simply put, iPhone1 has access to all the variables and methods defined in class iPhone now.
In this case, we don’t have to do any code duplication as we have inherited(taken) all the methods from our parent class iPhone. Thus we don’t have to define the call function again. Also, we don’t set the mobile_id and memory in the __init__ function using super.

But what is this super().__init__(memory,user_id)?

In real life, your __init__ functions won’t be these nice two-line functions. You would need to define a lot of variables/attributes in your class and copying pasting them for the child class (here iphone1) becomes cumbersome. Thus there exists super(). Heresuper().__init__() actually calls the __init__ method of the parent iPhone Class here. So here when the __init__ function of class iPhone1 runs it automatically sets the memory and user_id of the class using the __init__ function of the parent class.

Where do we see this in ML/DS/DL? Below is how we create a PyTorch model. This model inherits everything from the nn.Module class and calls the __init__ function of that class using the super call.

class myNeuralNet(nn.Module):    def __init__(self):
        super().__init__()
        # Define all your Layers Here
        self.lin1 = nn.Linear(784, 30)
        self.lin2 = nn.Linear(30, 10)    def forward(self, x):
        # Connect the layer Outputs here to define the forward pass
        x = self.lin1(x)
        x = self.lin2(x)
        return x


But what is Polymorphism? We are getting better at understanding how classes work so I guess I would try to explain Polymorphism now. Look at the below class.

Here we have our base class Shape and the other derived classes — Rectangle and Circle. Also, see how we use multiple levels of inheritance in the Square class which is derived from Rectangle which in turn is derived from Shape. Each of these classes has a function called area which is defined as per the shape. So the concept that a function with the same name can do multiple things is made possible through Polymorphism in Python. In fact, that is the literal meaning of Polymorphism: “Something that takes many forms”. So here our function area takes multiple forms.

Another way that Polymorphism works with Python is with the isinstance of method. So using the above class, if we do:

Image for post

Thus, the instance type of the object mySquare is Square
Rectangle and Shape. And hence the object is polymorphic. This has a lot of good properties. For example, We can create a function that works with an Shape object, and it will totally work with any of the derived classes (SquareCircleRectangle etc.) by making use of Polymorphism.

Image for post

 

Some More Info:

 
Why do we see function names or attribute names starting with Single and Double Underscores? Sometimes we want to make our attributes and functions in classes private and not allow the user to see them. This is a part of Encapsulation where we want to “restrict the direct access to some of an object’s components”. For instance, let’s say, we don’t want to allow the user to see the memory(RAM) of our iPhone once it is created. In such cases, we create an attribute using underscores in variable names.

So when we create the iPhone Class in the below way, you won’t be able to access your phone memory or the privatefunc using Tab in your ipython notebooks because the attribute is made private now using _.

Image for post

But you would still be able to change the variable value using (Though not recommended)

Image for post

You would also be able to use the method _privatefunc using myphone._privatefunc(). If you want to avoid that you can use double underscores in front of the variable name. For example, below the call to print(myphone.__memory) throws an error. Also, you are not able to change the internal data of an object by using myphone.__memory = 1.

Image for post

But, as you see you can access and modify these self.__memory values in your class definition in the function setMemory for instance.

 

Conclusion

 

Figure

Photo by Jeshoots.com on Unsplash

 

I hope this has been useful for you to understand classes. There is still so much to classes that remain that I would cover in my next post on magic methods. Stay Tuned. Also, to summarize, in this post, we learned about OOP and creating classes along with the various fundamentals of OOP:

  • Encapsulation: Object contains all the data for itself.
  • Inheritance: We can create a class hierarchy where methods from parent classes pass on to child classes
  • Polymorphism: A function takes many forms, or the object might have multiple types.

To end this post, I would be giving an exercise for you to implement as I think it might clear some concepts for you. Create a class that lets you manage 3d objects(sphere and cube) with volumes and surface areas. The basic boilerplate code is given below:

import mathclass Shape3d:
    def __init__(self, name):
        self.name = name    def surfaceArea(self):
        pass
        
    def volume(self):
        pass
    
    def getName(self):
        return self.name
        
class Cuboid():
    passclass Cube():
    pass
        
class Sphere():
    pass


I will put the answer in the comments for this article.

If you want to learn more about Python, I would like to call out an excellent course on Learn Intermediate level Python from the University of Michigan. Do check it out.

I am going to be writing more of such posts in the future too. Let me know what you think about the series. Follow me up at Medium or Subscribe to my blog to be informed about them. As always, I welcome feedback and constructive criticism and can be reached on Twitter @mlwhiz.

Also, a small disclaimer — There might be some affiliate links in this post to relevant resources, as sharing knowledge is never a bad idea.

 
Bio: Rahul Agarwal is Senior Statistical Analyst at WalmartLabs. Follow him on Twitter @mlwhiz.

Original. Reposted with permission.

Related:


Sign Up

By subscribing you accept KDnuggets Privacy Policy