7 Tips To Produce Readable Data Science Code

In this article, we will go over a few steps that you can take to produce readable, high-quality code.

7 Tips To Produce Readable Data Science Code
Image by svstudioart on Freepik


The ability to write readable code is something developers refer to as a form of art. Although I partially agree with that statement, writing code, especially readable one, is a skill that can be developed. 

The only way to improve the readability of your code is to practice writing more quality code. Therefore, I recommend reading code written by other developers known for writing high-quality code.

In general, readable code is an essential outcome that becomes even more critical the more complex your code gets. In data science, in particular, writing readable code is extremely important because data science applications can be pretty tricky to understand, so an extra complexity added by not very well-written code is not preferable.

I assume you agree that writing readable code is essential. Still, how do I make my code more readable?

In this article, we will go over a few steps that you can take to produce readable, high-quality code.


Have a Structure in Mind before you Start Coding


Before you open your editor and start coding your way through a problem, try to plan out your code structure. Make a structure, as detailed as possible, of your variables, functions, classes, and modules and how they all connect to solve the problem.

Doing that will save a lot of time later when you implement the code, expand, and deploy it. I recommend you add that structure to the documentation of your code or make it available on GitHub if you're planning to make your code open-source.


Name your Variables Descriptively


I know we all get tempted sometimes to name our variables X, Y, and Z. But then we get baffled when we read our code months later, trying to figure out what exactly is stored in variable X! Naming your variables descriptive names will not only help strangers reading your code, but also future you reading the code.

When naming your variables, aim for exact names rather than short ones. For example, if you're calculating the average of a list of values, don't name your variable ave or av; sometimes, like average_height or average_time. Today, many code editors offer autocompletion, so using longer names will not make your code-writing process slower.

Moreover, keep the variable's names related to that source if your code implements an algorithm introduced in a specific paper or book. Remember to include that source at the top of your code files. 


Use Functions Wisely


Functions can be a great tool to have an organized, concise code. That is, if used correctly. Use functions for tasks that can be packaged into a function, for example, applying an operation over different data points or implementing an algorithm step. When you name your functions, use the same logic we covered when naming your variables.

Collect functions with related functionalities in one code file and make it a module if possible. That makes it easier to find, expand and use the function.

Try to be clear about the specific type of the function's attributes and make your functions secure and expandable.


Target Clear and Concise Docstrings


Documenting your code is an essential step, whether it be complete documentation or in-code one (docstrings). Docstrings are strings at the start of a code file, after a function/ class definition, that tell the reader the purpose of the code/ function or class.

Docstrings are meant to be a short hint of what your code is and how it works. For example, when used at the beginning of a function (right below the function header), it should include the expected types of attributes and their role in the function, the output of the function, and a sentence or two about how that output is calculated.

In the case of a class, the docstring should include the class attributes and methods and how they can be used.


Don't Reinvent the Wheel (unless you can do it better)


If a function you need is already implemented by a supported package or a third-party developer, use it rather than implementing it all over. When you use a package, ensure you know all the functions it includes so you save time trying to implement something you can use.

The few scenarios where I recommend implementing a functionality by yourself are when you're new to programming and trying to learn how everything works or if you can implement a function better with less complexity. Otherwise, it's simpler for you and others to use your code to use what's already implemented. 


Aim for Simple, Longer Steps than Short, Complex Ones


When you try to implement an idea presented in a paper or a book, or an algorithm, aim for clear steps rather than trying to group multiple steps together to have shorter code.

Yes, shorter code may show how good you are at using the idioms of a programming language. Still, it can also make your code unnecessarily complex. Complex to read, test, debug and expand. Especially when the algorithm you're implementing is complex by itself, adding this extra layer of complexity by grouping several steps together will lead to a not-very-flexible code.


Stay Consistent


Consistency is excellent for code readability. When planning your code structure, decide on a style to use throughout your code. That includes determining a system to name your variables, functions, and classes. How you will use comments, address the different mathematical steps in this algorithm, modulize your code and use existing packages.

Following and understanding your code will be much faster when you have a consistent style and pattern.


Final Thoughts 


One of the unavoidable things about being a data scientist is using code written by someone else. And although reading and understanding codes written by other people will always be a time-consuming task, there are a couple of steps you can follow to make your code easier to follow and use for those who are going to use it. 

Although the tips covered in this article can be used by anyone who writes code and not only data scientists, it is, in my opinion, extra crucial for data scientists to produce readable code to overcome some of the difficulties that already exist due to the math behind most data science algorithms.

So, if you want to start writing better, more readable code, this article would be a good place to start. Remember, writing better code is a skill; just like any other skill, it improves with practice.

Sara Metwalli is a Ph.D. candidate at Keio University researching ways to test and debug quantum circuits. I am an IBM research intern and Qiskit advocate helping build a more quantum future. I am also a writer on Medium, Built-in, She Can Code, and KDN writing articles about programming, data science, and tech topics. I am also a lead in the Woman Who Code Python international chapter, a train enthusiast, a traveler, and a photography lover.